Methods for Genotyping

ABSTRACT

Novel methods and kits are disclosed for reducing the complexity of a nucleic acid sample to interrogate a collection of target sequences, for example, to discriminating between alleles at polymorphic positions in a genome. Complexity reduction can be accomplished by extension of a capture probes followed by amplification of the extended capture probe using common primers. The capture probes may be locus specific and allele-specific. The amplified sample may be hybridized to an array designed to interrogate the desired fragments for the presence or absence of a polymorphism. In some aspects the methods employ allele-specific extension of oligonucleotides that are complementary to one of the alleles at the 3′ end of the oligonucleotide. The allele-specific oligonucleotides are resistant to proof reading activity from a polymerase and may be extended in an allele-specific manner by a DNA polymerase with a functional 3′ to 5′ exonuclease activity.

RELATED APPLICATIONS

This application is a continuation-in-part of U.S. application Ser. No.11/614,948, filed, Dec. 21, 2006, which claims the priority of U.S.Provisional Application No. 60/752,782, filed Dec. 21, 2005 and is acontinuation-in-part of U.S. application Ser. No. 11/133,750, filed onMay 19, 2005, now abandoned, which is a continuation of U.S. applicationSer. No. 11/022,099, filed on Dec. 23, 2004, now abandoned, which is adivisional of U.S. application Ser. No. 10/272,155, filed on Oct. 14,2002, now U.S. Pat. No. 7,108,976, which claims the priority of U.S.Provisional Application No. 60/389,747, filed on Jun. 17, 2002. Each ofthese applications is incorporated herein in its entirety by referencefor all purposes.

FIELD OF THE INVENTION

The invention relates to methods for selectively reducing the complexityof a nucleic acid sample and for determining the genotype of one or morepolymorphisms using allele-specific primer extension.

BACKGROUND OF THE INVENTION

The past years have seen a dynamic change in the ability of science tocomprehend vast amounts of data. Pioneering technologies such as nucleicacid arrays allow scientists to delve into the world of genetics in fargreater detail than ever before. Exploration of genomic DNA has longbeen a dream of the scientific community. Held within the complexstructures of genomic DNA lies the potential to identify, diagnose, ortreat diseases like cancer, Alzheimer disease or alcoholism.Exploitation of genomic information from plants and animals may alsoprovide answers to the world's food distribution problems.

Recent efforts in the scientific community, such as the publication ofthe draft sequence of the human genome in February 2001, have changedthe dream of genome exploration into a reality. Genome-wide assays,however, must contend with the complexity of genomes; the human genomefor example is estimated to have a complexity of 3×10⁹ base pairs. Novelmethods of sample preparation and sample analysis that reduce complexitymay provide for the fast and cost effective exploration of complexsamples of nucleic acids, particularly genomic DNA.

Single nucleotide polymorphisms (SNPs) have emerged as the marker ofchoice for genome wide association studies and genetic linkage studies.Building SNP maps of the genome will provide the framework for newstudies to identify the underlying genetic basis of complex diseasessuch as cancer, mental illness and diabetes as well as normal phenotypicvariation. Due to the wide ranging applications of SNPs there is acontinued need for the development of increasingly robust, flexible, andcost-effective technology platforms that allow for genotype scoring ofmany polymorphisms in large numbers of samples.

Allele-specific primer extension is one method of analysis of pointmutations (Newton et al., Nucleic Acids Res., 17, 2503-2516 (1989). ForSNP genotyping the method uses two allele-specific extension primersthat differ in their 3′-positions. Each primer matches one alleleperfectly but has a 3′ mismatch with the other allele. The DNApolymerase has much higher extension efficiency for the perfect matchthan for the mismatch.

SUMMARY OF THE INVENTION

The present invention provides for novel methods of sample preparationand analysis comprising managing or reducing the complexity of a nucleicacid sample by amplification of a collection of target sequences usingtarget-specific capture probes. In some embodiments the extended captureprobes are attached to a solid support such as beads; in someembodiments the extended capture probes are attached to an array. Insome embodiments the amplified collection of target sequences isanalyzed by hybridization to an array that is designed to interrogatesequence variation in the target sequences. In some embodiments theamplified collection of target sequences is analyzed by hybridization toan array of tag probes.

In one embodiment a method of generating a collection of targetsequences from a nucleic acid sample is disclosed. The nucleic fragmentis fragmented to generate a plurality of fragments. A collection ofcapture probes is hybridized to the fragments wherein the capture probesare attached to a solid support at a 5′ end and comprise a spacersequence near the 5 end, multiple dU residues, a tag sequence for eachspecies of capture probe, a target sequence, and the 3′ end of thecapture probes terminates with a specific nucleotide corresponding tothe polymorphism. The capture probes are extended in the presence of oneor more DNA polymerase. The solid support is washed to remove thefragments. The extended capture probes are cleaved from the solidsupport via photo or enzymatic cleavage. The tag sequence of theextended capture probes is hybridized to an array comprising a pluralityof tag probes. The target sequences are generated containing thepolymorphism on the solid support.

In some embodiments the capture probes are attached to the solid supportthrough a covalent interaction. In another embodiment the solid supportcomprises a plurality of beads. In some embodiments the beads furthercomprises anti-digoxigenin, thereby capturing the capture probes with adigoxigenin label at the 5′ end.

In another embodiment the DNA polymerase has a 3′ to 5′ Exonucleaseproofreading activity. In some embodiments the DNA polymerase comprisesa mesophilic polymerase. In another embodiment, the DNA polymerasecomprises TAQ GOLD polymerase (Life Technologies), VENT polymerase (NewEngland Biolabs), DEEP VENT polymerase (New England Biolabs), T4 DNAPolymerase, E. Coli Klenow fragment, and T7 DNA polymerase. In someembodiments, the beads are washed with 0.15N NaOH to remove fragments ofthe nucleic acid sample.

In another embodiment the enzymatic cleavage of the extended captureprobes from the solid support uses an endonuclease. In some embodimentsthe endonuclease comprises uracil DNA glycosylase (UDG). In anotherembodiment, the enzymatic cleavage of the extended capture probes fromthe solid support is by heat. In some embodiments, the photo cleavage isaccomplished by UV light. In another embodiment, the UV light has awavelength is between 200 and 400 nanometers.

In some embodiments, in each extension reaction there is at least onespecies of labeled dNTP. In another embodiment, one or more species ofdNTPs is labeled with biotin. In some embodiment, the labelednucleotides are incorporated into the extended capture probes. Inanother embodiment there is one extension reaction wherein fourdifferentially labeled dNTPs are present in the extension reaction.

In some embodiments the capture probes are extended on a solid supportin a 5′ to 3′ direction. In another embodiment the spacer sequencecomprise a run of 2 to 12 T residues. In some embodiments the dU regioncomprises UIUI and UIUIUI. In another embodiment, Endonuclease V cleavesdI residues. In some embodiments, the 3′ end of the capture probescomprises 0, 1, 2 or 3 phosphorothioate linkages.

In another embodiment, a method for genotyping one or more polymophismsin a nucleic acid sample is disclosed. A collection of target sequencesfrom the sample is generated. The collection of target sequences ishybridized to an array comprising a plurality of tag probes thathybridize to the tag sequences in the extended capture probes. Thehybridization pattern is analyzed on each of the arrays to determine atleast one genotype.

In one embodiment, a method of generating a collection of targetsequences containing one or more polymorphisms from a nucleic acidsample is disclosed. A collection of capture probes is synthesizedcomprising a plurality of different species of primers wherein a 5′ endof the collection of capture probe is attached to a solid support and a3′ variable region is specific for a target sequence in the collectiontarget sequences and terminates with a specific nucleotide correspondingto the polymorphism. The nucleic acid sample is amplified with a wholegenome amplification method. The nucleic acid sample is fragmented togenerate fragments. The fragments are hybridized to the collection ofcapture probes on a solid support. The solid support is washed to removethe fragments. The target sequences containing the polymorphism aregenerated by extending the capture probes using DNA polymerases.

In some embodiments, the whole genome amplification method comprisesmultiple displacement amplification. In another embodiment the primerextension reaction comprises full or partial substitution of one or morelabeled dNTPs. In some embodiments, the polymorphism comprises a SNP.

In another embodiment a method of genotyping one or more polymorphiclocations in a sample is disclosed. A collection of target sequencesfrom the sample is generated. The collection of target sequences ishybridized to an array designed to interrogate at least one polymorphiclocation in the collection of target sequences. The hybridizationpattern is analyzed to determine the identity of an allele or allelespresent at one or more polymorphic location in the collection of targetsequences.

In one embodiment a method of amplifying a collection of targetsequences from a nucleic acid sample is disclosed. A collection ofcapture probes is generated that has different species of primers eachspecies including a first common sequence and a 3′ variable region thatis specific for a target sequence in a collection of target sequences.Each target sequence is represented by at least one species of primerwhich hybridizes to the target sequence and the collection of captureprobes is attached to a solid support so that the 3′ end of the captureprobes is available for extension. The nucleic acid sample is fragmentedand an adaptor that has a second common sequence is ligated to thefragments. The adaptor-ligated fragments are hybridized to thecollection of capture probes and the capture probes are extended usingthe hybridized adaptor-ligated fragments as template for extension andthereby incorporating the target sequence and the second common sequenceinto the 3′ end of the extended capture probe. The extended captureprobes are then amplified using first and second common sequenceprimers.

In some embodiments the capture probes are attached to the solid supportthrough a covalent interaction. In another embodiment there is a tagsequence in the capture probes that is unique for each species ofcapture probe and the capture probes are attached to the solid supportby hybridization to a collection of tag probes that are covalentlyattached to the solid support. In some embodiments each species ofcapture probe is attached to the solid support in a discrete location.

In another embodiment the extended capture probes are released from thesolid support prior to amplification. Prior to releasing the extendedcapture probes from the solid support nucleic acids that are notcovalently attached to the solid support may be removed.

In another embodiment the extended capture probes are enriched prior toamplification. In some embodiments capture probes are enriched byincorporation of labeled nucleotides into the extended capture probesfollowed by isolation of labeled capture probes by affinitychromatography. In some embodiments capture probes are labeled withbiotin and avidin, streptavidin or an anti-biotin antibody, which may bemonoclonal, may be used to isolate extended capture probes. In anotherembodiment extended capture probes are made double stranded and singlestranded nucleic acid in the sample is digested by, for example anuclease, such as, for example Exonuclease I. In another embodiment theextended capture probes are circularized prior to amplification anduncircularized nucleic acid in the sample is digested by, for example, anuclease, such as, for example, Exonuclease III. In some embodiments theextended capture probes are circularized by hybridizing anoligonucleotide splint to the extended capture probes so that the 5′ and3′ ends of extended capture probes are juxtaposed and then ligating theends of the extended capture probes.

In one embodiment a method of genotyping one or more polymorphiclocations in a sample is disclosed. An amplified collection of targetsequences from the sample is prepared and hybridized to an arraydesigned to interrogate at least one polymorphic location in thecollection of target sequences. The hybridization pattern is analyzed todetermine the identity of the allele or alleles present at one or morepolymorphic location in the collection of target sequences.

In another embodiment a method for analyzing sequence variations in apopulation of individuals is disclosed. A nucleic acid sample isobtained from each individual and a collection of target sequences fromeach nucleic acid sample is amplified. Each amplified collection oftarget sequences is hybridized to an array designed to interrogatesequence variation in the collection of target sequences to generate ahybridization pattern for each sample and the hybridization patterns areanalyzed or compared to determine the presence or absence of sequencevariation in the population of individuals.

In another embodiment a method of amplifying a collection of targetsequences from a nucleic acid sample in solution is disclosed. Acollection of capture probes is generated. The collection includesdifferent species of primers having a first common sequence and a 3′variable region that is specific for a target sequence wherein eachtarget sequence in a collection of target sequences is represented by atleast one species of primer which hybridizes to the target sequence. Thenucleic acid sample is fragmented and an adaptor is ligated to thefragments so that the strand that is ligated to the 5′ end of thefragment strands has a second common sequence and the strand that isligated to the 3′ end of the fragments lacks the second common sequenceand is blocked from extension at the 3′ end. The adaptor-ligatedfragments are hybridized to the collection of capture probes and thecapture probes are extended using the hybridized adaptor-ligatedfragments as template for extension and thereby incorporating the targetsequence and the complement of the second common sequence into theextended capture probes. The extended capture probes are then amplifiedwith first and second common sequence primers.

In one embodiment an amino group is used to block extension at the 3′end of the adaptor strand.

In another embodiment a method for genotyping one or more polymorphismsin a nucleic acid sample is disclosed. The nucleic acid sample isfragmented and an adaptor comprising a first common priming sequence isligated to the fragments. A collection of capture probes is ligated tothe fragments. The capture probes have a second common priming sequence,a tag sequence unique for each species of capture probe, a first locusspecific sequence, a Type IIs restriction enzyme recognition sequence,and a second locus specific sequence. The Type IIs restriction enzymerecognition sequence is positioned so that the enzyme will cutimmediately 5′ of the polymorphic base in a target sequence. The captureprobes are extended to generate single-stranded extension products andthen amplified using the first and second common sequence primers. Theamplified product is digested with a Type IIs restriction enzyme and thefragments are extended in the presence of one or more type of labeledddNTP. In one embodiment the extension is done is four separatereactions, one for each ddNTP and the ddNTPs may be labeled with thesame label. The extended fragments are then hybridized to four separatearrays. In another embodiment the ddNTPs are differentially labeled withat least two different labels and the extension reactions may be done inless than four reactions and each reaction may be hybridized to aseparate array. The arrays are arrays of tag probes that hybridize tothe tag sequences in the capture probes. The hybridization pattern oneach of the arrays is analyzed to determine at least one genotype.

In another embodiment one of the common sequence primers is resistant tonuclease digestion and the sample is treated with a nuclease thatcleaves 5′ to 3′ after the fragments are extended in the presence oflabeled ddNTP. In one embodiment the primer is resistant to nucleasedigestion because it contains phosphorothioate linkages. In someembodiments the nuclease is T7 Gene 6 Exonuclease. In some embodimentsthe ddNTPs are labeled with biotin.

In another embodiment a method for screening for sequence variations ina population of individuals is disclosed. A nucleic acid sample fromeach individual is provided and the sample is amplified and genotyped byone of the method of the invention and the genotypes from the samplesare compared to determine the presence or absence of sequence variationin the population of individuals.

In another embodiment a kit for amplifying a collection of targetsequences is disclosed. The kit has a collection of capture probes thatis specific for a collection of target sequences and has a first commonsequence that is common to all of the capture probes, an adaptor thathas a second common sequence; and a pair of first and second commonsequence primers. In another embodiment the collection of capture probesin the kit is covalently attached to a solid support so that the 3′ endof the capture probes is available for extension. In another embodimentthe kit also provides a restriction enzyme, buffer, DNA polymerase anddNTPs. In some embodiments the restriction enzyme is a Type IIsrestriction enzyme. In another embodiment the kit also contains aligase, dNTPs, ddNTPs, buffer and DNA polymerase. In some embodimentsone of the common sequence primers is resistant to nuclease digestion.

In another embodiment the capture probes also have a tag sequence uniquefor each species of capture probe and a Type IIs restriction enzymerecognition sequence. In another embodiment the adaptor has a firststrand comprising a common sequence and a second strand that does notcontain the complement of that common sequence and the second strand isblocked from extension at the 3′ end by, for example, an amino group.

In another embodiment a collection of capture probes attached to a solidsupport is disclosed. The solid support may be arrays, beads,microparticles, microtitre dishes or gels.

In another embodiment a plurality of oligonucleotides attached to asolid support is disclosed. The solid support may be arrays, beads,microparticles, microtitre dishes or gels. The oligonucleotides may bereleased and used for a variety of analysis. The plurality ofoligonucleotides may include a collection of capture probes.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 shows a schematic of allele-specific primer extension.Allele-specific primers may be designed to hybridize to either strand.

FIG. 2 shows a bead-based allele-specific primer extension method.

FIG. 3 shows a method of detection of tagged extension products on anarray of tag probes.

FIG. 4A shows an array of allele-specific probes with nuclease resistantlinkages at the 3′ ends of the probes.

FIG. 4B shows genomic DNA hybridized to the array of allele-specificprobes with complementarity between the polymorphic base and the 3′ endof the probes.

FIG. 4C shows genomic DNA hybridized to the array of allele-specificprobes with a mismatch between the polymorphic base and the 3′ end ofthe probes.

FIG. 4D shows extension of the allele-specific probes with a perfectmatch at the 3′ end and absence of extension of allele-specific probeswith a mismatch at the 3′ end.

FIG. 5 shows a method of amplifying specific target sequences using acapture probe that is locus specific and genomic DNA that has beenligated to an adaptor. The capture probes are attached to a solidsupport and extended to incorporate the sequence of interest and theadaptor sequence. The extended capture probes are released from thesolid support and amplified with a single primer pair.

FIG. 6 shows a method where the capture probes are attached to a solidsupport by hybridization to a probe that is covalently attached to thesolid support. The probes on the array are complementary to a tagsequence in the 5′ region of the capture probe. The capture probehybridizes so that the 3′ end is available for extension.

FIG. 7 shows a schematic of solution-based multiplexed SNP genotyping. Asample is fragmented and ligated to an adaptor so that the adaptorsequence that hybridizes to the 3′ end of the strands of the fragmentsis blocked from extension. Locus specific capture probes are hybridizedto the fragments and extended in solution then amplified by PCR usingprimers to A1 and A2. Prior to amplification the extended capture probesmay be enriched by, for example, removal of non-extended products or bypositive selection of extended products.

FIGS. 8A and 8B show a method of multiplexed anchored runoffamplification wherein the alleles present at different polymorphicpositions are analyzed by hybridization to an array of tag probes. Thecapture probe includes a recognition site for a Type IIs restrictionenzyme so that the enzyme cuts immediately upstream of the polymorphiclocus. The capture probe is extended by one labeled nucleotide and theidentity of the nucleotide is determined by hybridization to an array orprobes that are complementary to the tag sequences in the captureprobes.

FIG. 9 shows an enrichment scheme. Biotin is incorporated into theextended capture probes and biotin labeled extended capture probes areselected by affinity chromatography.

FIG. 10 shows another enrichment scheme using nuclease that is specificfor single stranded nucleic acid. Capture probes that are fully extendedthrough the adaptor site on the genomic DNA fragment are converted todouble stranded DNA by annealing and extension of a primer thathybridizes to the adaptor sequence.

FIG. 11 shows another enrichment scheme. The ends of the extendedcapture probes are ligated together to form a circle using a splintoligonucleotide that is complementary to the primer sites at the ends ofthe extended capture probes. The sample is digested with an exonucleaseso circularized sequences are protected from digestion.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS (A) General

The present invention has many preferred embodiments and relies on manypatents, applications and other references for details known to those ofthe art. Therefore, when a patent, application, or other reference iscited or repeated below, it should be understood that it is incorporatedby reference in its entirety for all purposes as well as for theproposition that is recited.

As used in this application, the singular form “a,” “an,” and “the”include plural references unless the context clearly dictates otherwise.For example, the term “an agent” includes a plurality of agents,including mixtures thereof.

An individual is not limited to a human being but may also be otherorganisms including but not limited to mammals, plants, bacteria, orcells derived from any of the above.

Throughout this disclosure, various aspects of this invention can bepresented in a range format. It should be understood that thedescription in range format is merely for convenience and brevity andshould not be construed as an inflexible limitation on the scope of theinvention. Accordingly, the description of a range should be consideredto have specifically disclosed all the possible sub-ranges as well asindividual numerical values within that range. For example, descriptionof a range such as from 1 to 6 should be considered to have specificallydisclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numberswithin that range, for example, 1, 2, 3, 4, 5, and 6. The same holdstrue for ranges in increments of 10⁵, 10⁴, 10³, 10², 10, 10⁻¹, 10⁻²,10⁻³, 10⁻⁴, or 10⁻⁵, for example. This applies regardless of the breadthof the range.

The practice of the present invention may employ, unless otherwiseindicated, conventional techniques and descriptions of organicchemistry, polymer technology, molecular biology (including recombinanttechniques), cell biology, biochemistry, and immunology, which arewithin the skill of the art. Such conventional techniques includepolymer array synthesis, hybridization, ligation, and detection ofhybridization using a label. Specific illustrations of suitabletechniques can be had by reference to the example herein below. However,other equivalent conventional procedures can, of course, also be used.Such conventional techniques and descriptions can be found in standardlaboratory manuals such as Genome Analysis: A Laboratory Manual Series(Vols. I-IV), Using Antibodies: A Laboratory Manual, Cells: A LaboratoryManual, PCR Primer: A Laboratory Manual, and Molecular Cloning: ALaboratory Manual (all from Cold Spring Harbor Laboratory Press), Stryer(anyone have the cite), Gait, “Oligonucleotide Synthesis: A PracticalApproach” 1984, IRL Press, London, Nelson and Cox (2000), Lehninger,Principles of Biochemistry 3^(rd) Ed., W.H. Freeman Pub., New York, N.Y.and Berg et al. (2002) Biochemistry, 5^(th) Ed., W.H. Freeman Pub., NewYork, N.Y. all of which are herein incorporated in their entirety byreference for all purposes.

The present invention can employ solid substrates, including arrays insome preferred embodiments. Methods and techniques applicable to polymer(including protein) array synthesis have been described in U.S. PatentPublication No. 20050074787, International Publication No. WO 00/58516,U.S. Pat. Nos. 5,143,854, 5,242,974, 5,252,743, 5,324,633, 5,384,261,5,424,186, 5,451,683, 5,482,867, 5,491,074, 5,527,681, 5,550,215,5,571,639, 5,578,832, 5,593,839, 5,599,695, 5,624,711, 5,631,734,5,795,716, 5,831,070, 5,837,832, 5,856,101, 5,858,659, 5,936,324,5,968,740, 5,974,164, 5,981,185, 5,981,956, 6,025,601, 6,033,860,6,040,193, 6,090,555, 6,136,269, 6,269,846 and 6,428,752, in PCTApplications Nos. PCT/US99/00730 (International Publication Number WO99/36760) and PCT/US 01/04285, which are all incorporated herein byreference in their entirety for all purposes.

Patents that describe synthesis techniques in specific embodimentsinclude U.S. Pat. Nos. 5,412,087, 6,147,205, 6,262,216, 6,310,189,5,889,165 and 5,959,098 which are each incorporated herein by referencein their entirety for all purposes. Nucleic acid arrays are described inmany of the above patents, but the same techniques are applied topolypeptide arrays.

The present invention also contemplates many uses for polymers attachedto solid substrates. These uses include gene expression monitoring,profiling, library screening, genotyping, and diagnostics. Geneexpression monitoring and profiling methods can be shown in U.S. Pat.Nos. 5,800,992, 6,013,449, 6,020,135, 6,033,860, 6,040,138, 6,177,248and 6,309,822. Genotyping and uses therefore are shown in U.S. Pat. Nos.5,856,092, 6,300,063, 5,858,659, 6,284,460, 6,361,947, 6,368,799,6,333,179 and 6,872,529 which are each incorporated herein by reference.Other uses are embodied in U.S. Pat. Nos. 5,871,928, 5,902,723,6,045,996, 5,541,061, and 6,197,506 which are incorporated herein byreference.

The present invention also contemplates sample preparation methods incertain preferred embodiments. For example, see the patents in the geneexpression, profiling, genotyping and other use patents above, as wellas U.S. Pat. Nos. 6,582,938, 5,437,990, 5,215,899, 5,466,586, and4,357,421, and Gubler et al., 1985, Biochemica et Biophysica Acta,Displacement Synthesis of Globin Complementary DNA: Evidence forSequence Amplification.

Prior to or concurrent with analysis, the nucleic acid sample may beamplified by a variety of mechanisms, some of which may employ PCR. See,e.g., PCR Technology: Principles and Applications for DNA Amplification(Ed. H. A. Erlich, Freeman Press, NY, N.Y., 1992); PCR Protocols: AGuide to Methods and Applications (Eds. Innis, et al., Academic Press,San Diego, Calif., 1990); Mattila et al., Nucleic Acids Res. 19, 4967(1991); Eckert et al., PCR Methods and Applications 1, 17 (1991); PCR(Eds. McPherson et al., IRL Press, Oxford); and U.S. Pat. Nos.4,683,202, 4,683,195, 4,800,159 4,965,188, and 5,333,675, each of whichis incorporated herein by reference in their entireties for allpurposes. The sample may be amplified on the array. See, for example,U.S. Pat. No. 6,300,070 which is incorporated herein by reference.

Other suitable amplification methods include the ligase chain reaction(LCR) (e.g., Wu and Wallace, Genomics 4, 560 (1989), Landegren et al.,Science 241, 1077 (1988) and Barringer et al. Gene 89:117 (1990)),transcription amplification (Kwoh et al., Proc. Natl. Acad. Sci. USA 86,1173 (1989) and WO88/10315), self-sustained sequence replication(Guatelli et al., Proc. Nat. Acad. Sci. USA, 87, 1874 (1990),WO/88/10315 and WO90/06995), selective amplification of targetpolynucleotide sequences (U.S. Pat. No. 6,410,276), consensus sequenceprimed polymerase chain reaction (CP-PCR) (U.S. Pat. No. 4,437,975),arbitrarily primed polymerase chain reaction (AP-PCR) (U.S. Pat. Nos.5,413,909, 5,861,245), degenerately primed PCR (DOP-PCR) (See, Teleniuset al. Genomics 13: 718-725, 1992 and Cheung and Nelson Proc. Natl.Acad. Sci. 93: 14676-14679, 1996), primer extension PCR (PEP) (See,Zhang et al. Proc. Natl. Acad. Sci. 89: 5847-5851, 1992), and nucleicacid based sequence amplification (NABSA). (See, U.S. Pat. Nos.5,409,818, 5,554,517, and 6,063,603, each of which is incorporatedherein by reference). Other amplification methods that may be used aredescribed in, U.S. Pat. Nos. 6,582,938, 5,242,794, 5,494,810, and4,988,617, each of which is incorporated herein by reference. Inpreferred aspects, the sample may be amplified usingmultiple-displacement amplification (MDA) or OMNIPLEX amplification. MDAuses a highly processive DNA polymerase and random exonuclease resistantprimers in an isothermal amplification reaction (Dean et al., Proc.Natl. Acad. Sci. 99: 5261-5266, 2002). The method is based on stranddisplacement synthesis and generally results in products that aregreater than 10 kb in length. The OMNIPLEX amplification method usesrandom fragmentation of the DNA to form a library of fragments ofdefined size. The fragments can be amplified using a DNA polymerase(Langmore, Pharmacogenomics 3:557-60, 2002 and US Patent Pub. No.20030040620).

Additional methods of sample preparation and techniques for reducing thecomplexity of a nucleic sample are described in Dong et al., GenomeResearch 11, 1418 (2001), in U.S. Pat. Nos. 6,300,070, 6,361,947,6,391,592, 6,958,225, 6,632,611 and 6,872,529 and U.S. Patent Pub. No.20050260628, and, which are incorporated herein by reference in theirentireties.

The present invention also contemplates detection of hybridizationbetween ligands in certain preferred embodiments. See U.S. Pat. Nos.5,143,854, 5,578,832; 5,631,734; 5,834,758; 5,936,324; 5,981,956;6,025,601; 6,141,096; 6,185,030; 6,201,639; 6,218,803; and 6,225,625 andin PCT Application PCT/US99/06097 (published as WO99/47964), each ofwhich also is hereby incorporated by reference in its entirety for allpurposes.

The practice of the present invention may also employ conventionalbiology methods, software and systems. Computer software products of theinvention typically include computer readable medium havingcomputer-executable instructions for performing the logic steps of themethod of the invention. Suitable computer readable medium includefloppy disk, CD-ROM/DVD/DVD-ROM, hard-disk drive, flash memory, ROM/RAM,magnetic tapes and etc. The computer executable instructions may bewritten in a suitable computer language or combination of severallanguages. Basic computational biology methods are described in, e.g.Setubal and Meidanis et al., Introduction to Computational BiologyMethods (PWS Publishing Company, Boston, 1997); Salzberg, Searles,Kasif, (Ed.), Computational Methods in Molecular Biology, (Elsevier,Amsterdam, 1998); Rashidi and Buehler, Bioinformatics Basics:Application in Biological Science and Medicine (CRC Press, London, 2000)and Ouelette and Bzevanis Bioinformatics: A Practical Guide for Analysisof Gene and Proteins (Wiley & Sons, Inc., 2^(nd) ed., 2001).

The present invention may also make use of various computer programproducts and software for a variety of purposes, such as probe design,management of data, analysis, and instrument operation. See, U.S. Pat.Nos. 5,593,839, 5,795,716, 5,733,729, 5,974,164, 6,066,454, 6,090,555,6,185,561, 6,188,783, 6,223,127, 6,229,911 and 6,308,170. Computermethods related to genotyping using high density microarray analysis mayalso be used in the present methods, see, for example, US Patent Pub.Nos. 20050250151, 20050244883, 20050108197, 20050079536 and 20050042654.

Methods for analysis of genotype using array data are described, forexample, in Di, X., et al. (2005) Bioinformatics, 21, 1958-1963, Liu,W., et al. (2003) Bioinformatics, 19, 2397-2403 and Rabbee and Speed(2006) Bioinformatics 22:7-12. Methods for copy number analysis based onhybridization to arrays of oligonucleotides have been disclosed, forexample, in US Patent Pub. Nos. 20040157243, 20060134674, 20050130217,and 20050064476.

Additionally, the present invention may have preferred embodiments thatinclude methods for providing genetic information over networks such asthe Internet as shown in U.S. Patent applications 20020183936,20030100995, 20030120432, 20040002818, and. 20040049354.

The present invention provides a flexible and scalable method foranalyzing complex samples of nucleic acids, such as genomic DNA. Thesemethods are not limited to any particular type of nucleic acid sample:plant, bacterial, animal (including human) total genome DNA, RNA, cDNAand the like may be analyzed using some or all of the methods disclosedin this invention. The word “DNA” may be used below as an example of anucleic acid. It is understood that this term includes all nucleicacids, such as DNA and RNA, unless a use below requires a specific typeof nucleic acid. This invention provides a powerful tool for analysis ofcomplex nucleic acid samples. From experimental design to isolation ofdesired fragments and hybridization to an appropriate array, theinvention provides for fast, efficient and inexpensive methods ofcomplex nucleic acid analysis.

(B) Definitions

An “array” comprises a support, preferably solid, with nucleic acidprobes attached to the support. Preferred arrays typically comprise aplurality of different nucleic acid probes that are coupled to a surfaceof a substrate in different, known locations. These arrays, alsodescribed as “microarrays” or colloquially “chips” have been generallydescribed in the art, for example, U.S. Pat. Nos. 5,143,854, 5,445,934,5,744,305, 5,677,195, 5,800,992, 6,040,193, 5,424,186 and Fodor et al.,Science, 251:767-777 (1991). Each of which is incorporated by referencein its entirety for all purposes.

Arrays may generally be produced using a variety of techniques, such asmechanical synthesis methods or light directed synthesis methods thatincorporate a combination of photolithographic methods and solid phasesynthesis methods. Techniques for the synthesis of these arrays usingmechanical synthesis methods are described in, e.g., U.S. Pat. Nos.5,384,261, and 6,040,193, which are incorporated herein by reference intheir entirety for all purposes. Although a planar array surface ispreferred, the array may be fabricated on a surface of virtually anyshape or even a multiplicity of surfaces. Arrays may be nucleic acids onbeads, gels, polymeric surfaces, fibers such as fiber optics, glass orany other appropriate substrate. (See U.S. Pat. Nos. 5,770,358,5,789,162, 5,708,153, 6,040,193 and 5,800,992, which are herebyincorporated by reference in their entirety for all purposes.)

Arrays may be packaged in such a manner as to allow for diagnostic useor can be an all-inclusive device; e.g., U.S. Pat. Nos. 5,856,174 and5,922,591 incorporated in their entirety by reference for all purposes.

Preferred arrays are commercially available from Affymetrix under thebrand name GeneChip® and are directed to a variety of purposes,including genotyping and gene expression monitoring for a variety ofeukaryotic and prokaryotic species. (See Affymetrix Inc., Santa Claraand their website at affymetrix.com.)

Nucleic acids according to the present invention may include any polymeror oligomer of pyrimidine and purine bases, preferably cytosine,thymine, and uracil, and adenine and guanine, respectively. (See AlbertL. Lehninger, Principles of Biochemistry, at 793-800 (Worth Pub. 1982)which is herein incorporated in its entirety for all purposes). Indeed,the present invention contemplates any deoxyribonucleotide,ribonucleotide or peptide nucleic acid component, and any chemicalvariants thereof, such as methylated, hydroxymethylated or glucosylatedforms of these bases, and the like. The polymers or oligomers may beheterogeneous or homogeneous in composition, and may be isolated fromnaturally occurring sources or may be artificially or syntheticallyproduced. In addition, the nucleic acids may be DNA or RNA, or a mixturethereof, and may exist permanently or transitionally in single-strandedor double-stranded form, including homoduplex, heteroduplex, and hybridstates.

An “oligonucleotide” or “polynucleotide” is a nucleic acid ranging fromat least 2, preferably at least 8, 15 or 20 nucleotides in length, butmay be up to 50, 100, 1000, or 5000 nucleotides long or a compound thatspecifically hybridizes to a polynucleotide. Polynucleotides of thepresent invention include sequences of deoxyribonucleic acid (DNA) orribonucleic acid (RNA) or mimetics thereof which may be isolated fromnatural sources, recombinantly produced or artificially synthesized. Afurther example of a polynucleotide of the present invention may be apeptide nucleic acid (PNA). (See U.S. Pat. No. 6,156,501 which is herebyincorporated by reference in its entirety.) The invention alsoencompasses situations in which there is a nontraditional base pairingsuch as Hoogsteen base pairing which has been identified in certain tRNAmolecules and postulated to exist in a triple helix. “Polynucleotide”and “oligonucleotide” are used interchangeably in this application.

A genome is all the genetic material of an organism. In some instances,the term genome may refer to the chromosomal DNA. Genome may bemultichromosomal such that the DNA is cellularly distributed among aplurality of individual chromosomes. For example, in human there are 22pairs of chromosomes plus a gender associated XX or XY pair. DNA derivedfrom the genetic material in the chromosomes of a particular organism isgenomic DNA. The term genome may also refer to genetic materials fromorganisms that do not have chromosomal structure. In addition, the termgenome may refer to mitochondria DNA. A genomic library is a collectionof DNA fragments representing the whole or a portion of a genome.Frequently, a genomic library is a collection of clones made from a setof randomly generated, sometimes overlapping DNA fragments representingthe entire genome or a portion of the genome of an organism.

The term “chromosome” refers to the heredity-bearing gene carrier of aliving cell which is derived from chromatin and which comprises DNA andprotein components (especially histones). The conventionalinternationally recognized individual human genome chromosome numberingsystem is employed herein. The size of an individual chromosome can varyfrom one type to another with a given multi-chromosomal genome and fromone genome to another. In the case of the human genome, the entire DNAmass of a given chromosome is usually greater than about 100,000,000 bp.For example, the size of the entire human genome is about 3×10⁹ bp. Thelargest chromosome, chromosome no. 1, contains about 2.4×10⁸ bp whilethe smallest chromosome, chromosome no. 22, contains about 5.3×10⁷ bp.

A “chromosomal region” is a portion of a chromosome. The actual physicalsize or extent of any individual chromosomal region can vary greatly.The term “region” is not necessarily definitive of a particular one ormore genes because a region need not take into specific account theparticular coding segments (exons) of an individual gene.

An “allele” refers to one specific form of a genetic sequence (such as agene) within a cell, an individual or within a population, the specificform differing from other forms of the same gene in the sequence of atleast one, and frequently more than one, variant sites within thesequence of the gene. The sequences at these variant sites that differbetween different alleles are termed “variances”, “polymorphisms”, or“mutations”. At each autosomal specific chromosomal location or “locus”an individual possesses two alleles, one inherited from one parent andone from the other parent, for example one from the mother and one fromthe father. An individual is “heterozygous” at a locus if it has twodifferent alleles at that locus. An individual is “homozygous” at alocus if it has two identical alleles at that locus.

The term “fragment,” “segment,” or “DNA segment” refers to a portion ofa larger DNA polynucleotide or DNA. A polynucleotide, for example, canbe broken up, or fragmented into, a plurality of segments. Variousmethods of fragmenting nucleic acid are well known in the art. Thesemethods may be, for example, either chemical or physical in nature.Chemical fragmentation may include partial degradation with a DNase;partial depurination with acid; the use of restriction enzymes;intron-encoded endonucleases; DNA-based cleavage methods, such astriplex and hybrid formation methods, that rely on the specifichybridization of a nucleic acid segment to localize a cleavage agent toa specific location in the nucleic acid molecule; or other enzymes orcompounds which cleave DNA at known or unknown locations (see, forexample, U.S. Ser. No. 09/358,664). Physical fragmentation methods mayinvolve subjecting the DNA to a high shear rate. High shear rates may beproduced, for example, by moving DNA through a chamber or channel withpits or spikes, or forcing the DNA sample through a restricted size flowpassage, e.g., an aperture having a cross sectional dimension in themicron or submicron scale. Other physical methods include sonication andnebulization. Combinations of physical and chemical fragmentationmethods may likewise be employed such as fragmentation by heat andion-mediated hydrolysis. See for example, Sambrook et al., “MolecularCloning: A Laboratory Manual,” 3^(rd) Ed. Cold Spring Harbor LaboratoryPress, Cold Spring Harbor, N.Y. (2001) (“Sambrook et al.) which isincorporated herein by reference for all purposes. These methods can beoptimized to digest a nucleic acid into fragments of a selected sizerange. Useful size ranges may be from 100, 200, 400, 700 or 1000 to 500,800, 1500, 2000, 4000 or 10,000 base pairs. However, larger size rangessuch as 4000, 10,000 or 20,000 to 10,000, 20,000 or 500,000 base pairsmay also be useful.

A number of methods disclosed herein require the use of restrictionenzymes to fragment the nucleic acid sample. In general, a restrictionenzyme recognizes a specific nucleotide sequence of four to eightnucleotides and cuts the DNA at a site within or a specific distancefrom the recognition sequence. For example, the restriction enzyme EcoRIrecognizes the sequence GAATTC and will cut a DNA molecule between the Gand the first A. The length of the recognition sequence is roughlyproportional to the frequency of occurrence of the site in the genome. Asimplistic theoretical estimate is that a six base pair recognitionsequence will occur once in every 4096 (4⁶) base pairs while a four basepair recognition sequence will occur once every 256 (4⁴) base pairs. Insilico digestions of sequences from the Human Genome Project show thatthe actual occurrences may be more or less frequent, depending on thesequence of the restriction site. Because the restriction sites arerare, the appearance of shorter restriction fragments, for example thoseless than 1000 base pairs, is much less frequent than the appearance oflonger fragments. Many different restriction enzymes are known andappropriate restriction enzymes can be selected for a desired result.(For a description of many restriction enzymes see, New England BioLabsCatalog which is herein incorporated by reference in its entirety forall purposes).

Type-IIs endonucleases are a class of endonuclease that, like otherendonucleases, recognize specific sequences of nucleotide base pairswithin a double stranded polynucleotide sequence. Upon recognizing thatsequence, the endonuclease will cleave the polynucleotide sequence,generally leaving an overhang of one strand of the sequence, or “stickyend.” The Type-IIs endonucleases are unique because they generally donot require palindromic recognition sequences and they generally cleaveoutside of their recognition sites. For example, the Type-IIsendonuclease EarI recognizes and cleaves in the following manner:

               ↓ 5′-C-T-C-T-T-C-N-N-N-N-N-3′ (SEQ ID NO: 1)3′-G-A-G-A-A-G-n-n-n-n-n-5′ (SEQ ID NO: 2)                        ↑

where the recognition sequence is -C-T-C-T-T-C-, N and n representcomplementary, ambiguous base pairs and the arrows indicate the cleavagesites in each strand. As the example illustrates, the recognitionsequence is non-palindromic, and the cleavage occurs outside of thatrecognition site.

Type-IIs endonucleases are generally commercially available and are wellknown in the art. Specific Type-IIs endonucleases which are useful inthe present invention include, e.g., BbvI, BceAI, BfuAI, EarI, AlwI,BbsI, BsaI, BsmAI, BsmBI, BspMI, HgaI, SapI, SfaNI, BsmFI, FokI, andPleI. Other Type-IIs endonucleases that may be useful in the presentinvention may be found, for example, in the New England Biolabscatalogue. In some embodiments Type-IIs enzymes that generate a recessed3′ end are particularly useful.

“Adaptor sequences” or “adaptors” are generally oligonucleotides of atleast 5, 10, or 15 bases and preferably no more than 50 or 60 bases inlength; however, they may be even longer, up to 100 or 200 bases.Adaptor sequences may be synthesized using any methods known to those ofskill in the art. For the purposes of this invention they may, asoptions, comprise primer binding sites, recognition sites forendonucleases, common sequences and promoters. The adaptor may beentirely or substantially double stranded. A double stranded adaptor maycomprise two oligonucleotides that are at least partially complementary.The adaptor may be phosphorylated or unphosphorylated on one or bothstrands. Adaptors may be more efficiently ligated to fragments if theycomprise a substantially double stranded region and a short singlestranded region which is complementary to the single stranded regioncreated by digestion with a restriction enzyme. For example, when DNA isdigested with the restriction enzyme EcoRI the resulting double strandedfragments are flanked at either end by the single stranded overhang5′-AATT-3′, an adaptor that carries a single stranded overhang5′-AATT-3′ will hybridize to the fragment through complementaritybetween the overhanging regions. This “sticky end” hybridization of theadaptor to the fragment may facilitate ligation of the adaptor to thefragment but blunt ended ligation is also possible. Blunt ends can beconverted to sticky ends using the exonuclease activity of the Klenowfragment. For example when DNA is digested with PvuII the blunt ends canbe converted to a two base pair overhang by incubating the fragmentswith Klenow in the presence of dTTP and dCTP. Overhangs may also beconverted to blunt ends by filling in an overhang or removing anoverhang.

Methods of ligation will be known to those of skill in the art and aredescribed, for example in Sambrook et at. (2001) and the New EnglandBioLabs catalog both of which are incorporated herein by reference forall purposes. Methods include using T4 DNA Ligase which catalyzes theformation of a phosphodiester bond between juxtaposed 5′ phosphate and3′ hydroxyl termini in duplex DNA or RNA with blunt and sticky ends; TaqDNA Ligase which catalyzes the formation of a phosphodiester bondbetween juxtaposed 5′ phosphate and 3′ hydroxyl termini of two adjacentoligonucleotides which are hybridized to a complementary target DNA; E.coli DNA ligase which catalyzes the formation of a phosphodiester bondbetween juxtaposed 5′-phosphate and 3′-hydroxyl termini in duplex DNAcontaining cohesive ends; and T4 RNA ligase which catalyzes ligation ofa 5′ phosphoryl-terminated nucleic acid donor to a 3′hydroxyl-terminated nucleic acid acceptor through the formation of a3′->5′ phosphodiester bond, substrates include single-stranded RNA andDNA as well as dinucleoside pyrophosphates; or any other methodsdescribed in the art.

When a fragment has been digested on both ends with the same enzyme ortwo enzymes that leave the same overhang, the same adaptor may beligated to both ends. Digestion with two or more enzymes can be used toselectively ligate separate adaptors to either end of a restrictionfragment. For example, if a fragment is the result of digestion withEcoRI at one end and BamHI at the other end, the overhangs will be5′-AATT-3′ and 5′GATC-3′, respectively. An adaptor with an overhang ofAATT will be preferentially ligated to one end while an adaptor with anoverhang of GATC will be preferentially ligated to the second end.

An adaptor may be ligated to one or both strands of the fragmented DNA.In some embodiments a double stranded adaptor is used but only onestrand is ligated to the fragments. Ligation of one strand of an adaptormay be selectively blocked. Any known method to block ligation of onestrand may be employed. For example, one strand of the adaptor can bedesigned to introduce a gap of one or more nucleotides between the 5′end of that strand of the adaptor and the 3′ end of the target nucleicacid. Adaptors can be designed specifically to be ligated to the terminiproduced by restriction enzymes and to introduce gaps or nicks. Forexample, if the target is an EcoRI digested fragment an adaptor with a5′ overhang of TTA could be ligated to the AATT overhang left by EcoRIto introduce a single nucleotide gap between the adaptor and the 3′ endof the fragment. Phosphorylation and kinasing can also be used toselectively block ligation of the adaptor to the 3′ end of the targetmolecule. Absence of a phosphate from the 5′ end of an adaptor willblock ligation of that 5′ end to an available 3′OH. For additionaladaptor methods for selectively blocking ligation see U.S. Pat.6,197,557 and U.S. Ser. No. 09/910,292 which are incorporated byreference herein in their entirety for all purposes.

Adaptors may also incorporate modified nucleotides that modify theproperties of the adaptor sequence. For example, phosphorothioate groupsmay be incorporated in one of the adaptor strands. A phosphorothioategroup is a modified phosphate group with one of the oxygen atomsreplaced by a sulfur atom. In a phosphorothioated oligo (often called an“S-Oligo”), some or all of the internucleotide phosphate groups arereplaced by phosphorothioate groups. The modified backbone of an S-Oligois resistant to the action of most exonucleases and endonucleases.Phosphorothioates may be incorporated between all residues of an adaptorstrand, or at specified locations within a sequence. A useful option isto sulfurize only the last few residues at each end of the oligo. Thisresults in an oligo that is resistant to exonucleases, but has a naturalDNA center.

The term “genotyping” refers to the determination of the geneticinformation an individual carries at one or more positions in thegenome. For example, genotyping may comprise the determination of whichallele or alleles an individual carries for a single SNP or thedetermination of which allele or alleles an individual carries for aplurality of SNPs. For example, a particular nucleotide in a genome maybe an A in some individuals and a C in other individuals. Thoseindividuals who have an A at the position have the A allele and thosewho have a C have the C allele. In a diploid organism the individualwill have two copies of the sequence containing the polymorphic positionso the individual may have an A allele and a C allele or alternativelytwo copies of the A allele or two copies of the C allele. Thoseindividuals who have two copies of the C allele are homozygous for the Callele, those individuals who have two copies of the A allele arehomozygous for the C allele, and those individuals who have one copy ofeach allele are heterozygous. The array may be designed to distinguishbetween each of these three possible outcomes. A polymorphic locationmay have two or more possible alleles and the array may be designed todistinguish between all possible combinations.

Polymorphism refers to the occurrence of two or more geneticallydetermined alternative sequences or alleles in a population. Apolymorphic marker or site is the locus at which divergence occurs.Preferred markers have at least two alleles, each occurring at frequencyof preferably greater than 1%, and more preferably greater than 10% or20% of a selected population. A polymorphism may comprise one or morebase changes, an insertion, a repeat, or a deletion. A polymorphic locusmay be as small as one base pair. Polymorphic markers includerestriction fragment length polymorphisms, variable number of tandemrepeats (VNTR's), hypervariable regions, minisatellites, dinucleotiderepeats, trinucleotide repeats, tetranucleotide repeats, simple sequencerepeats, insertion elements such as Alu or small insertions ordeletions, for example, deletions or insertions of 1-10 bases. The firstidentified allelic form is arbitrarily designated as the reference formand other allelic forms are designated as alternative or variantalleles. The allelic form occurring most frequently in a selectedpopulation is sometimes referred to as the wild type form. Diploidorganisms may be homozygous or heterozygous for allelic forms. When anorganism carries two identical alleles the organism is homozygous atthat position. When an organism carries two different alleles theorganism is heterozygous at that position. Normal cells that areheterozygous at one or more loci may give rise to tumor cells that arehomozygous at those loci. This loss of heterozygosity may result fromstructural deletion of normal genes or loss of the chromosome carryingthe normal gene, mitotic recombination between normal and mutant genes,followed by formation of daughter cells homozygous for deleted orinactivated (mutant) genes; or loss of the chromosome with the normalgene and duplication of the chromosome with the deleted or inactivated(mutant) gene.

Single nucleotide polymorphisms (SNPs) are positions at which twoalternative bases occur at appreciable frequency (generally greater than1%) in the human population, and are the most common type of humangenetic variation. The site is usually preceded by and followed byhighly conserved sequences of the allele (e.g., sequences that vary inless than 1/100 or 1/1000 members of the populations).

A single nucleotide polymorphism usually arises due to substitution ofone nucleotide for another at the polymorphic site. A transition is thereplacement of one purine by another purine or one pyrimidine by anotherpyrimidine. A transversion is the replacement of a purine by apyrimidine or vice versa. Single nucleotide polymorphisms can also arisefrom a deletion of a nucleotide or an insertion of a nucleotide relativeto a reference allele.

A diallelic polymorphism has two forms in a population. A triallelicpolymorphism has three forms. A polymorphism between two nucleic acidscan occur naturally, or be caused by exposure to or contact withchemicals, enzymes, or other agents, or exposure to agents that causedamage to nucleic acids, for example, ultraviolet radiation, mutagens orcarcinogens.

The design and use of allele-specific probes for analyzing polymorphismsis described by e.g., Saiki et al., Nature 324, 163-166 (1986);Dattagupta, EP 235,726, Saiki, and WO 89/11548. Allele-specific probescan be designed that hybridize to a segment of target DNA from oneindividual but do not hybridize to the corresponding segment fromanother individual due to the presence of different polymorphic forms inthe respective segments from the two individuals. Hybridizationconditions should be sufficiently stringent that there is a significantdifference in hybridization intensity between alleles, and preferably anessentially binary response, whereby a probe hybridizes to only one ofthe alleles.

The term “linkage” as used herein describes the tendency of genes,alleles, loci or genetic markers to be inherited together as a result oftheir location on the same chromosome. Linkage can be measured invarious ways.

“Linkage disequilibrium”, or LD”, as used herein, refers to thepreferential association of a particular allele or genetic marker with aspecific allele, or genetic marker at a nearby chromosomal location morefrequently than expected by chance for any particular allele frequencyin the population. For example, if locus X has alleles a and b, whichoccur equally frequently, and linked locus Y has alleles c and d, whichoccur equally frequently, one would expect the combination ac to occurwith a frequency of 0.25. If ac occurs more frequently, then alleles aand c are in linkage disequilibrium.

Linkage disequilibrium may result from natural selection of certaincombination of alleles or because an allele has been introduced into apopulation too recently to have reached equilibrium with linked alleles.A marker in linkage disequilibrium can be particularly useful indetecting susceptibility to disease (or other phenotype) notwithstandingthat the marker does not cause the disease. For example, a marker (X)that is not itself a causative element of a disease, but which is inlinkage disequilibrium with a gene (including regulatory sequences) (Y)that is a causative element of a phenotype, can be detected to indicatesusceptibility to the disease in circumstances in which the gene Y maynot have been identified or may not be readily detectable.

Linkage can be analyzed by calculation of LOD (log of the odds) values.A lod value is the relative likelihood of obtaining observed segregationdata for a marker and a genetic locus when the two are located at arecombination fraction (θ), versus the situation in which the two arenot linked, and thus segregating independently (Thompson & Thompson,Genetics in Medicine (5th ed, W. B. Saunders Company, Philadelphia,1991); Strachan, “Mapping the human genome” in The Human Genome (BIOSScientific Publishers Ltd, Oxford), Chapter 4). A series of likelihoodratios are calculated at various recombination fractions (θ), rangingfrom θ.=0.0 (coincident loci) to θ.=0.50 (unlinked). Thus, thelikelihood at a given value of θ is: probability of data if loci linkedat 0 to probability of data if loci unlinked. The computed likelihoodsare usually expressed as the log 10 of this ratio (i.e., a lod score).For example, a lod score of 3 indicates 1000:1 odds against an apparentobserved linkage being a coincidence. The use of logarithms allows datacollected from different families to be combined by simple addition.Computer programs are available for the calculation of lod scores fordiffering values of θ (e.g., LIPED, MLINK (Lathrop, Proc. Nat. Acad.Sci. (USA) 81:3443-3446 (1984)). For any particular lod score, arecombination fraction may be determined from mathematical tables. SeeSmith et al., Mathematical tables for research workers in human genetics(Churchill, London, 1961); Smith, Ann. Hum. Genet. 32:127-150 (1968).The value of θ at which the lod score is the highest is considered to bethe best estimate of the recombination fraction.

Positive lod score values suggest that the two loci are linked, whereasnegative values suggest that linkage is less likely (at that value of.theta.) than the possibility that the two loci are unlinked. Byconvention, a combined lod score of +3 or greater (equivalent to greaterthan 1000:1 odds in favor of linkage) is considered definitive evidencethat two loci are linked. Similarly, by convention, a negative lod scoreof −2 or less is taken as definitive evidence against linkage of the twoloci being compared. Negative linkage data are useful in excluding achromosome or a segment thereof from consideration. The search focuseson the remaining non-excluded chromosomal locations.

The term “target sequence”, “target nucleic acid” or “target” refers toa nucleic acid of interest. The target sequence may or may not be ofbiological significance. As non-limiting examples, target sequences mayinclude regions of genomic DNA which are believed to contain one or morepolymorphic sites, DNA encoding or believed to encode genes or portionsof genes of known or unknown function, DNA encoding or believed toencode proteins or portions of proteins of known or unknown function,and DNA encoding or believed to encode regulatory regions such aspromoter sequences, splicing signals, polyadenylation signals, etc. Thenumber of sequences to be interrogated can vary, but preferably are fromabout 1000, 2,000, 5,000, 10,000, 20,000 or 100,000 to 5000, 10,000,100,000, 1,000,000 or 3,000,000 target sequences.

Statistical and computational methods for mapping of complex traits aredisclosed, for example, in McKeigue et al., Am. J. Hum. Genet. 76:1-7(2005), Lander and Shork, Science 265:2037-2048 (1994), McKeigue, Am. J.Hum. Genet. 63:241-251 (1998), and Patterson et al., Am. J. Hum. Genet.74:979-1000 (2004).

Capture probes are oligonucleotides that have a 5′ common sequence and a3′ locus or target-specific region or primer. The locus ortarget-specific region is designed to hybridize near a region of nucleicacid that includes a region of interest so that the locus ortarget-specific region of the capture probe can be used as a primer andbe extended through the region of interest to make a copy of the regionof interest. The common sequence in the capture probe may be used as apriming site in subsequent rounds of amplification using a common primeror a limited number of common primers. The same common sequence may bepresent in many or all or the capture probes in a collection of captureprobes. Capture probes may also comprise other sequences, for example,tag sequences that are unique for different species of capture probes,and endonuclease recognition sites.

A tag or tag sequence is a selected nucleic acid with a specifiednucleic acid sequence. A tag probe has a region that is complementary toa selected tag. A set of tags or a collection of tags is a collection ofspecified nucleic acids that may be of similar length and similarhybridization properties, for example similar T_(m). The tags in acollection of tags bind to tag probes with minimal cross hybridizationso that a single species of tag in the tag set accounts for the majorityof tags which bind to a given tag probe species under hybridizationconditions. For additional description of tags and tag probes andmethods of selecting tags and tag probes see U.S. Pat. Nos. 6,458,530and 7,157,564 and EP 0799897, each of which is incorporated herein byreference in their entirety.

A collection of capture probes may be designed to interrogate acollection of target sequences. The collection would comprise at leastone capture probe for each target sequence to be amplified andpreferably one capture probe for each allele of each polymorphism beinginterrogated. There may be multiple different capture probes for asingle target sequence in a collection of capture probes, for example,there may be a capture probe that hybridizes to one strand of the targetsequence and a capture probe that hybridizes to the opposite strand ofthe target sequence, these may be referred to as a forward locus ortarget-specific primer and a reverse locus or target-specific primer. Inpreferred embodiments the capture probes have a region that iscomplementary to the region immediately 3′ of the polymorphic position.

(C) Generating Target Sequences Containing One or More Polymorphisms

Generally, the invention provides methods for generating a collection oftarget sequences containing one or more polymorphisms from a nucleicsample using extension of allele-specific probes (capture probes) thatare modified to resist proof-reading activity. This allows for the useof a DNA polymerase that has a functional 3′ to 5′ proof readingactivity as described in Lin-Ling et al., J Biochem Mol. Biol. 38:24-7(2005), Liao et al., Acta Pharmacol Sin. 26:302-6 (2005), and Zhang etal., Trends Biotechnol. 23:92-96 (2005). The extension may be performedusing capture probes that are attached to solid supports, such as beadsor glass substrates or capture probes in solution. Modifications thatmay be used include, for example, phosphorothioate linkages and lockednucleic acid (LNA) modifications that block mismatch excision duringproofreading—otherwise the mismatched base would simply be excised andthe primer would be extended. Preferably the LNA residue is at thepenultimate position of the primer. LNA is further described, forexample, in Jepsen et al., Oligonucleotides 14:130-146 (2004) andPetersen and Wengel, Trends Biotechnol 21:74-81 (2003). The methods arerelated to those disclosed in U.S. Pat. No. 7,108,976, which isincorporated herein by reference in its entirety.

FIG. 1 shows a schematic of allele-specific amplification of a selectedpolymorphic locus. The polymorphism [101] in a double stranded segmentof DNA is hybridized with an allele-specific primer that iscomplementary to one strand or the other [103 and 105]. AS_(F)designates the allele-specific forward primer and AS_(R) theallele-specific reverse primer. Similarly, LS_(F) and LS_(R) refer tothe locus specific forward and reverse primers. In preferred aspectsboth strands are interrogated, but one of skill in the art willappreciate that either strand may be interrogated individually. Theallele-specific primer is extended and the extension product is furtheramplified using the allele-specific primer and a locus specific primer[107 and 109] that is complementary to a region downstream of thepolymorphism. A separate allele-specific primer is designed for eachpossible allele of the polymorphism. For example, for a biallelicpolymorphism there are two allele-specific probes, an A and a B alleleprobe, for each strand being interrogated. If the polymorphism is a SNPwith alleles C or T, the allele-specific probe to detect the C allelepreferably terminates at the 3′ end with a G and the allele-specificprobe to detect the T allele preferably terminates at the 3′ end with anA. In preferred aspects the 3′ terminal base in the allele-specificprobe is complementary to the variable position or positions of thepolymorphism. In another aspect the penultimate position in the probe iscomplementary to the variable position.

FIG. 2 shows a method of allele-specific primer extension using primersattached to beads [201]. The primer has a poly T region [207], acleavage region [209], a tag region [211] and an allele-specific primerregion [213] that is complementary to the target with the 3′ base beingcomplementary to the interrogation position [203] in the target [205].The target hybridizes to the allele-specific primer and if there is aperfect match between the 3′ end of the primer and the target, theprimer is extended using a polymerase with exonuclease activity. Labelednucleotides are present in the extension reaction and are incorporatedinto the extension product [215]. Following extension, the extensionproducts may be separated from bound target by denaturation and fromother nucleic acid in the sample by separation of the beads from thesolution. The target strand may be removed, for example, by washing with0.15 N NaOH, and the extension product may be cleaved in the cleavageregion to release a portion [217] of the extension product that has thetag region, the allele-specific region and the labeled extended region.The released product may be detected by hybridization to a tag array.Each different allele to be detected has a different associated tagsequence which hybridizes to a different location on an array of tagprobes. In some embodiments the cleavage region [209] includes one ormore dU residues, one or more dI residues, a photo cleavable residue oran enzymatic cleavage site. In another embodiment capture probes andtarget form hybrids in solution and the hybrids are captured using abead that has affinity for the capture probe. For example, the captureprobe may be labeled at the 5′ end with dignoxigenin and captured usinga bead coated with anti-digoxigenin or the capture probe may be labeledat the 5′ end with biotin and captured with a streptavidin coated bead.

After release from the bead or solid support, the product [217] may thenbe analyzed by, for example, hybridization to an array as shown in FIG.3. The figure shows two different extension products [301 and 302]hybridized to different features [311 and 312] of and array. Eachfeature contains many copies of a different oligonucleotide [307]. Theoligonucleotide is complementary to the tag region [303 and 315] of theextended capture probes [301 and 302]. Different features have probes ofdifferent sequence so that each feature hybridizes to a different tagsequence. The allele-specific primer region of the capture probes [305and 309] remain single stranded. Information about the region ofinterest can be determined by analysis of the hybridization pattern.Detection of labeled capture probes at a feature indicates that theallele-specific portion of the capture probe attached to a specific tagsequence has been extended so that allele is present in the sample.Additional features [325] are shown. Preferably the array will have morethan 1,000, more than 10,000 or more than 100,000 different featureseach with many copies of a different oligonucleotide probe sequencecomplementary to a different tag sequence. For a bialleleic SNP, forexample, each allele will have a distinct (allele-specific) captureprobe that is labeled with a unique tag sequence so that the captureprobe for each allele can be separately detected by hybridization to adistinct feature on the array. Individual features may also be beads andfeatures may be present in multiples, for example, 2 or more beads orfeatures with the same probe.

In FIG. 4A the allele-specific capture probes [405, 407, 409 and 411]are attached to individual features [413] of an array. The 5′ end of theprobes is associated with the solid support. In this exampleallele-specific probes are shown for each strand and each allele of thebiallelic SNP [401]. Each feature associated with a given SNPinterrogates one allele in a strand-specific manner. The discriminationposition [415] changes according to the allele to be detected. A singlephosphorothioate linkage [403] is shown immediately 5′ to thediscrimination position.

In FIG. 4B genomic DNA is shown hybridized to the array in a locusspecific manner. The capture probes are hybridized with perfectlymatched target sequence. In preferred aspect the target is prepared forhybridization by amplifying genomic DNA with an unbiased whole genomeamplification method such as Multiple Displacement Amplification orOMNIPLEX amplification. Amplified DNA is fragmented to a size rangeoptimal for hybridization using chemical or enzymatic means prior tohybridization to an array. Increasingly stringent washing conditions areused to remove non-specifically hybridized target and target that has amismatch at the 3′ end of the probe. As shown in FIG. 4C some target mayhybridize with a mismatch at the interrogation position, but as shown inFIG. 4D target hybridized with a mismatch should not be extended. FIG.4D shows extension of the probes that are hybridized in a locus specificmanner [441] with incorporation of label (B) and failure to extend andlabel the probes hybridized with a mismatch at the interrogationposition [443]. The mismatch at the terminal base of the capture probeprevents extension of the probe by the DNA polymerase. The extension isperformed on the array using a DNA polymerase with 3′ to 5′ enxonucleaseproof reading activity. The primer extension reactions may contain oneor more labeled dNTPs, for example a biotinylated nucleotide such asDLR.

The amplified sample may be analyzed by any method known in the art, forexample, MALDI-TOF mass spec, capillary electrophoresis, OLA, LCR, RCA,dynamic allele-specific hybridization (DASH) or TAQMAN® assays (AppliedBiosystems, Foster City, Calif.). For other methods of genotypinganalyses see Syvanen, Nature Rev. Gen. 2:930-942 (2001) which isincorporated by reference in its entirety.

In one embodiment a method for generating a collection of targetsequences containing 3′ ends specific for each allele for a given SNPusing a bead-based solid support is disclosed. (For a description of abead-based solid support for amplifying complex genomic DNA, seeGunderson et al, Nat. Gen. 37:549-554 (2005) and U.S. Patent Pub.20050059048, each of which is herein incorporated by reference in itsentirety). Each capture probe is attached covalently to a solid supportand comprises a spacer sequence, multiple dU residues, a tag sequencethat is unique for each species of capture probe, and a target-specificsequence. The solid support is preferably beads but any suitable solidsupport known in the art may be used, for example, arrays,microparticles, microtitre dishes and gels. In one aspect the bead-basedsolid support comprises anti-digoxigenin antibodies. The 3′ end of thecapture probes terminates with a specific nucleotide corresponding to apolymorphism, including the SNP. The 3′ end of the capture comprises 0,1, or 3 phosphorothioate linkages. The spacer sequence may comprisemultiple T residues, such as T2 through T12. Preferably, the spacersequence is a T6. It serves as a linker to move the multiple dUsequences away from the solid support. The multiple dU residues maycomprise UIUI or UIUIUI wherein dI represents an inosine base.

The multiple dU residues serve as a mechanism to release theoligonucleotides, such as the capture probes, from the solid support.The dU residues can be used with UNG to create an abasic site and thenthis can lead to back bone cleavage under conditions of high temperatureand basic pH. An endonuclease comprising uracil DNA glycosylase (UDG)may be used to cleave the dU residues. Alternatively, dI residues can beused in conjunction with E. coli Endonuclease V to cleave thephosphodiester backbone. Beads with capture probes are loaded andhybridized with fragmented nucleic acid. The fragmented nucleic acidcomprises genomic DNA. The beads are then washed and primer extensionreaction of the capture probes is carried out using DNA polymerases withexo activity, including exo plus or minus, in combination with captureprobes with the phosphorothioate linkage to achieve high allelicspecificity. Such DNA polymerases may include mesophilic polymerases,such as T4 DNA polymerase, E. coli Klenow fragment, T7 DNA polymerase,or thermophilic DNA polymerases such as TAQ GOLD polymerase (LifeTechnologies), VENT polymerase and DEEP VENT polymerase (both NewEngland Biolabs). The DNA polymerase preferably has a 3′→5′ exoproofreading activity. There is at least one species of labeled dNTP ineach extension reaction. Biotin is one of the labels in one or morespecies of dNTPs. The labeled nucleotides are incorporated into theextended capture probes. There is one extension reaction wherein fourdifferentially labeled dNTPs are present in the extension reaction. Thecapture probes are extended on the solid support in a 5′ to 3′direction. Nucleic acid fragments, such as genomic DNA, hybridized tothe extended capture probes are removed by washing the beads with 0.15 NNaOH. This condition denatures any DNA duplex and results in singlestranded DNA. The extended capture probes are then cleaved from thebeads either via photo or enzymatic cleavage and hybridized to anarray-based solid support comprising a plurality of tag probes. Forexample, the extended capture probes may contain a photo cleavablebiotin modification. Upon exposure to UV light, photo cleavage occurs.The wavelength of the UV light can range from 200 nanometers to 400nanometers. Enzymatic cleavage includes use of heat or an endonuclease.The tag sequence of the extended capture probes is hybridized to anarray comprising a plurality of tag probes. The hybridization pattern isanalyzed on each of the arrays to determine at least one genotype.Target sequences containing the polymorphism on the solid support aregenerated. Each allele-specific capture probe is related to a unique tagarray probe.

In another embodiment, a method of generating a collection of targetsequences containing a polymorphism, such as a SNP, from a nucleic acidsample on an array-based solid support is disclosed. A collection ofcapture probes comprises a plurality of different species of primers. A5′ end of the capture probes is bound to an array surface with a spacersequence and a 3′ variable region that is specific for a target sequencein a collection of target sequence. Optionally, the spacer sequence atthe 5′ end of the capture probes may contain multiple T residues,ranging from T2 to T12. The 3′ end of the capture probes terminate witha specific nucleotide corresponding to a location of SNP. This terminalbase is associated with a 0, 1, or 3 phosphorothioate linkage to makethe target sequences resistant to proof-reading activity from the DNApolymerases. Each corresponding SNP location interrogates one allele ina strand-specific manner. Nucleic acid sample, such as genomic DNA, isfirst amplified using a whole genome amplification method, including MDA(using φ29 DNA polymerase) or OMNIPLEX amplification (Rubicon Genomics).The amplified genomic DNA is then fragmented to a size range optimal toarray hybridization. The fragmented genomic DNA is hybridized to thearray containing capture probes. After the hybridization reaction, thearray with extended capture probes is washed under a series ofincreasingly stringent conditions. The wash conditions for thearray-based solid support to reduce non-specific hybridization could befrom two times of SSC in 0.1% SDS at room temperature to 0.2 times ofSSC and 0.1% SDS at an elevated temperature, such as 68° C. Suchstringent conditions may prevent target sequences with a mismatch to the3′ end of the capture probes from forming. The hybridization pattern maybe analyzed to determine the identity of an allele or alleles present atone or more polymorphic location in the collection of target sequences.

In many embodiments, the nucleic acid samples containing a SNP areamplified in solution without any adapters. The allele-specific forwardprimers and standard reverse primers for each strand are used to amplifya target-specific sequence. The allele-specific forward primers have 0,1, and 3 phosphorothioate linkages. The DNA polymerases used have 3′ to5′ Exo proof-reading activity. These polymerases may comprise mesophilicpolymerases. The DNA polymerases may include TAQ GOLD polymerase (LifeTechnologies), VENT polymerase, DEEP VENT polymerase (both New EnglandBiolabs), T4 DNA polymerase, E coli Klenow fragment, or T7 DNApolymerase. The primer extension reaction comprises full or partialsubstitution of one or more labeled dNTPs. The labeled dNTPs comprisebiotin-labelled dNTPs. The array may be substituted for another solidsupport such as beads, microparticles, microtitre dishes, and gels. Thecapture probes are extended on a solid support in a 5′ to 3′ direction.The polymorphism comprises an SNP.

The hybridization and extension of capture probes are done while thecapture probes are attached to a solid support. Following extension ofthe capture probes nucleic acids that are not covalently attached to thesolid support may be washed away. The extended capture probes arereleased from the solid support prior to amplification. Amplificationtakes place while the extended capture probes are attached to the solidsupport. The extended capture probes may be released from the solidsupport by, for example, using a reversible linker or an enzymaticrelease, such as an endonuclease or by a change in conditions thatresults in disruption of an interaction between the capture probe andthe solid support, for example, when capture probes are associated withthe solid support through base pairing between a tag in the captureprobe and a tag probe on the solid support, disruption of the basepairing interaction releases the capture probes from the solid support.Enzymatic methods include, for example, use of uracil DNA glycosylase(UDG) or (UNG). UNG catalyzes the hydrolysis of DNA that containsdeoxyuridine at the site the uridine is incorporated. UNG generatesabasic sites and the abasic sites can be cleaved with an APendonuclease, with acid or with heat treatment. Incorporation of one ormore uridines in the capture probe followed by treatment with UNG willresult in release of the capture probe from the solid support. Athermolabile UNG may also be used.

Many amplification methods are most efficient at amplification ofsmaller fragments. For example, PCR most efficiently amplifies fragmentsthat are smaller than 2 kb (see, Saiki et al. 1988). The capture probesand fragmentation conditions are selected for efficient amplification ofa selected collection of target sequences. The size of the amplifiedfragments is dependent on where the target-specific region of thecapture probe hybridizes to the target sequence and the 5′ end of thefragment strand that the capture probe is hybridized to. In someembodiments of the present methods capture probes and fragmentationmethods are designed so that the target sequence of interest can beamplified as a fragment that is, for example, less than 20,000, 2,000,800, 500, 400, 200 or 100 base pairs long. The capture probe can bedesigned so that the 3′ end of the target-specific region hybridizes tothe base that is just 3′ of a position to be interrogated in the targetsequence. For example, if the sequence to be interrogated is apolymorphism and the sequence is 5′-GCTXATCGG-3′, where X is thepolymorphic position, the target-specific region of the capture probemay have the sequence 5′---CCGAT-3′. When the sample is fragmented withsite specific restriction enzymes the length of the fragments will alsodepend on the position of the nearest recognition site for the enzyme orenzymes used for fragmentation. A collection of target sequences may beselected based on proximity to restriction sites. The target sequencesare selected for amplification and analysis based on the presence of asequence of interest, such as a SNP, and proximity to a cleavage sitefor a selected restriction enzyme. For example, SNPs that are within200, 500, 800, 1,000, 1,500, 2,000 or 20,000 base pairs of either arestriction site, such as, for example, an EcoRI site, a BglI site, anXbaI site or any other restriction enzyme site may be selected to betarget sequences in a collection of target sequences. In another methoda fragmentation method that randomly cleaves the sample into fragmentsthat are 30, 100, 200, 500 or 1,000 to 100, 200, 500, 1,000 or 2,500base pairs on average may be used.

To detect the allele or alleles present the amplified fragments aredigested with a Type IIs restriction endonuclease and the fragments areextended in the presence of labeled ddNTPs. The fragments will beextended by a single ddNTP which corresponds to the allele present atthe polymorphic position. The extended fragments are hybridized to anarray of tag probes and the labeled nucleotide or nucleotides present ateach location are determined. The ddNTPs are all labeled with the samelabel, for example, biotin and the fragments are extended in fourseparate reactions, one for each of the four different ddNTPs. Eachreaction is hybridized to a different array so four arrays are used. Inanother embodiment the ddNTPs are labeled with differentially detectablelabels. There are four different labels and the extension reaction maybe done in a single reaction and the hybridization may be to a singlearray. There can be two different labels and extension reaction may bedone in two reactions and the hybridization may be to two differentarrays.

In the present methods one or more enrichment step may be included togenerate a sample that is enriched for extended capture probes prior toamplification with common sequence primers. It is desirable to separateextended capture probes from fragments from the starting nucleic acidsample, adapter-ligated fragments, adapter sequences or non-extendedcapture probes, for example. In one embodiment the capture probes areextended in the presence of a labeled dNTP, for example dNTPs labeledwith biotin. The labeled nucleotides are incorporated into the extendedcapture probes and the labeled extended capture probes are thenseparated from non-extended material by affinity chromatography. Whenthe label is biotin the labeled extended capture probes can be isolatedbased on the affinity of biotin for avidin, streptavidin or a monoclonalanti-biotin antibody. In one embodiment the antibody may be coupled toprotein-A agarose, protein-A sepharose or any other suitable solidsupport known in the art. Those of skill in the art will appreciate thatbiotin is one label that may be used but any other suitable label or acombination of labels may also be used, such as fluorescein which may beincorporated in the extended capture probe and an anti-fluoresceinantibody may be used for affinity purification of extended captureprobes. Other labels such as, digoxigenin, Cyanine-3, Cyanine-5,Rhodamine, and TEXAS RED (Molecular Probes) may also be used. Antibodiesto these labeling compounds may be used for affinity purification. Also,other haptens conjugated to dNTPs may be used, such as, for example,dinitrophenol (DNP).

The extension products may be enriched by circularization followed bydigestion with a nuclease such as Exonuclease VII or Exonuclease III.The extended capture probes may be circularized, for example, byhybridizing the ends of the extended capture probe to an oligonucleotidesplint so that the ends are juxtaposed and ligating the ends together.The splint will hybridize to the A1 and A2 sequences in the extendedcapture probe and bring the 5′ end of the capture probe next to the 3′end of the capture probe so that the ends may be ligated by a ligase,for example DNA Ligase or Ampligase Thermostable DNA. See, for example,U.S. Pat. No. 5,871,921 which is incorporated herein by reference. Thecircularized product will be resistant to nucleases that require eithera free 5′ or 3′ end.

A variety of nucleases may be used in one or more of the embodiments.Nucleases that are commercially available and may be useful in thepresent methods include: Mung Bean Nuclease, E. Coli Exonuclease I,Exonuclease III, Exonuclease VII, T7 Exonuclease, BAL-31 Exonuclease,Lambda Exonuclease, RecJ_(f), and Exonuclease T. Different nucleaseshave specificities for different types of nucleic acids making themuseful for different applications. Exonuclease I catalyzes the removalof nucleotides from single-stranded DNA in the 3′ to 5′ direction.Exonuclease I degrades excess single-stranded primer oligonucleotidefrom a reaction mixture containing double-stranded extension products.Exonuclease III catalyzes the stepwise removal of mononucleotides from3′-hydroxyl termini of duplex DNA. A limited number of nucleotides areremoved during each binding event, resulting in coordinated progressivedeletions within the population of DNA molecules. The preferredsubstrates are blunt or recessed 3′-termini, although the enzyme alsoacts at nicks in duplex DNA to produce single-strand gaps. The enzyme isnot active on single-stranded DNA, and thus 3″-protruding termini areresistant to cleavage. The degree of resistance depends on the length ofthe extension, with extensions 4 bases or longer being essentiallyresistant to cleavage. This property can be exploited to produceunidirectional deletions from a linear molecule with one resistant (3overhang) and one susceptible (blunt or 5″-overhang) terminus.Exonuclease VII is a single-strand directed enzyme with 5′ to 3′- and 3′to 5′-exonuclease activities making it the only bi-directional E. coliexonuclease with single-strand specificity. The enzyme has no apparentrequirement for divalent cation, and is fully active in the presence ofEDTA. Initial reaction products are acid-insoluble oligonucleotideswhich are further hydrolyzed into acid-soluble form. The products oflimit digests are small oligomers (dimers to dodecamers). For additionalinformation about nucleases see catalogs from manufacturers such as NewEngland Biolabs, Beverly, Mass.

In some embodiments one of the primers added for PCR amplification ismodified so that it is resistant to nuclease digestion, for example, bythe inclusion of phosphorothioate. Prior to hybridization to an arrayone strand of the double stranded fragments may be digested by a 5′ to3′ exonuclease such as T7 Gene 6 Exonuclease.

In some embodiments the nucleic acid sample, which may be, for example,genomic DNA, is fragmented, using for example, a restriction enzyme,DNase I or a non-specific fragmentation method such as that disclosed inU.S. Pat. No. 6,495,320, which is incorporated herein by reference inits entirety. Adaptors containing at least one priming site are ligatedto the fragmented DNA. Locus-specific primers are synthesized whichcontain a different adaptor sequence at the 5′ end. The adaptor-ligatedgenomic DNA is hybridized to the locus-specific primers and the locusspecific primer is extended. This may be done for example, by theaddition of DNA polymerase and dNTPs. Extension products may beamplified with primers that are specific for the adaptor sequences. Thisallows amplification of a collection of many different sequences using alimited set of primers. For example, a single set of primers may be usedfor amplification. In another embodiment a second amplification step iscarried out using the same or different primers.

In some embodiments a collection of target sequences is analyzed. Aplurality of capture probes is designed for a plurality of targetsequences. In some embodiments target sequences contain or are predictedto contain a polymorphism, for example, a SNP. The polymorphism may be,for example, near a gene that is a candidate marker for a phenotype,useful for diagnosis or a disorder or for carrier screening or thepolymorphism may define a haplotype block (see, Daly et al. Nat. Genet.29:229-32 (2001), and Rioux et al. Nat. Genet. 29:223-8 (2001) and U.S.Patent Publication Number 20030170665 each of which is incorporatedherein by reference in its entirety). A collection of capture probes maybe designed so that capture probes hybridize near a polymorphism, forexample, within 1, 5, 10, or 100 to 5, 10, 100, 1000, 10,000 or 100,000bases from the polymorphism. The capture probes hybridize to one strandof the target sequence and can be extended through the polymorphic siteor region so that the extension product comprises a copy of thepolymorphic region.

The amplified products are analyzed by hybridization to an array ofprobes attached to a solid support. In some embodiments an array ofprobes is specifically designed to interrogate a collection of targetsequences. The array of probes may interrogate, for example, from 1,000,5,000, 10,000 or 100,000 to 2,000, 5,000, 10,000, 100,000, 1,000,000 or3,000,000 different target sequences. In one embodiment the targetsequences contain SNPs and the array of probes is designed tointerrogate the allele or alleles present at one or more polymorphiclocation. The array may comprise a collection of probes that hybridizespecifically to one or more SNP containing sequences. The array maycomprise probes that correspond to different alleles of the SNP. Oneprobe or probe set may hybridize specifically to a first allele of aSNP, but not hybridize significantly to other alleles of the SNP and asecond probe set may be designed to hybridize to a second allele of aSNP but not hybridize significantly to other alleles. A hybridizationpattern from the array indicates which of the alleles are present in thesample. An array may contain probe sets to interrogate, for example,from 1,000, 5,000, 10,000 or 100,000 to 2,000, 5,000, 10,000, 100,000,1,000,000 or 3,000,000 different SNPs.

An array of probes that are complementary to tag sequences present inthe capture probes is used to interrogate the target sequences. In someembodiments the amplified targets are analyzed on an array of tagsequences, for example, the Affymetrix GENFLEX® array or Universal TagArray (3K, 5K, 10K or 25K) (Affymetrix, Inc., Santa Clara, Calif.). Inthis embodiment the capture probes comprise a tag sequence that isunique for each species of capture probe. A detectable label that isindicative of the allele present at the polymorphic site of interest isassociated with the tag. The labeled tags are hybridized to the one ormore arrays and the hybridization pattern is analyzed to determine whichalleles are present.

(D.) Multiplexed Anchored Runoff Amplification

Generally, the invention provides methods for highly multiplexed locusspecific amplification of nucleic acids and methods for analysis of theamplified products. In some embodiments the invention combines the useof capture probes that comprise a common sequence and a locus-specificregion with adaptor-modified sample nucleic acid; the adaptor comprisesa second common sequence. The capture probes are extended to producecopies of the sample DNA that contain common priming sequences flankingthe target sequence. The copies are amplified with a generic set ofprimers that recognize the common sequences. The amplified product maybe analyzed by hybridization to an array of probes.

In one embodiment the steps of the invention comprise: generatingcapture probes; digesting a nucleic acid sample; ligating adaptors tothe fragmented sample; mixing the fragments and the capture probes underconditions that will allow hybridization of the fragments and thecapture probes; extending the capture probes in the presence of dNTPsand polymerase; amplifying the extended capture probes; and detectingthe presence or absence of target sequences of interest.

One embodiment of the methods is illustrated in FIG. 5. Capture probesare designed with a locus specific region (LS1_(F) and LS1_(R)) thathybridizes near a target sequence of interest and a common sequence (A1)that is 5′ of the locus specific region. The common priming site may bepresent in a plurality of capture probes so that a primer to A1 may beused for amplification of a plurality of different targets in subsequentsteps. The capture probes are attached to a solid support so that theyhave a free 3′ end. A plurality of a single species of capture probesmay be synthesized at a discreet location on an array and may form adiscrete feature of an array. Each feature of the array may contain adifferent species of locus specific capture probe.

Genomic DNA is fragmented and adaptors comprising a second commonsequence (A2) are ligated to the fragments. The adaptor-ligatedfragments are then mixed with the capture probes under conditions thatallow hybridization of the fragments to the capture probes on the array.The capture probes are then extended using the adaptor-ligated fragmentsas template. The extension product has a common sequence, A1, near its5′ end and a second common sequence A2 near its 3′ end. These commonsequences flank a region of interest. The capture probes are thenreleased from the array and extended capture probes are amplified by PCRusing primers to the common sequences A1 and A2. The amplified productmay then be analyzed by, for example, hybridization to an array.Information about the region of interest can be determined by analysisof the hybridization pattern.

A second embodiment of the methods is illustrated in FIG. 6. Captureprobes are designed with a locus specific region (LS1 or LS2) and acommon sequence (A1) as in FIG. 5. In this embodiment the capture probesfurther comprise a tag sequence that is unique for each species ofcapture probe designed. (For a description of tags and tag probes, see,U.S. Ser. No. 08/626,285.) The capture probes are attached to the arraythrough hybridization of the tag sequence to a substantiallycomplementary tag probe sequence that is attached to the array. The tagprobes may be attached to the array in discrete locations. Differentspecies of tag probes are present at different discrete, spatiallyaddressable locations. Adaptor-ligated genomic DNA is hybridized to thearray so that the capture probes hybridize to target sequences in thesample. The capture probes are extended as in FIG. 5 to incorporate thetarget sequence and common sequence A2. The extended capture probes arereleased and amplified using primers A1 and A2. The amplified productmay then be analyzed by, for example, hybridization to an array.Information about the region of interest can be determined by analysisof the hybridization pattern. The amplified sample may be analyzed byany method known in the art, for example, MALDI-TOF mass spec, capillaryelectrophoresis, OLA, dynamic allele-specific hybridization (DASH) orTaqMan® (Applied Biosystems, Foster City, Calif.). For other methods ofgenotyping analyses see Syvanen, Nature Rev. Gen. 2:930-942 (2001) whichis herein incorporated by reference in its entirety.

In some embodiments the capture probes are attached to a solid supportprior to hybridization and hybridization takes place while the captureprobes are attached to the solid support. In some embodiments thecapture probes are synthesized on a solid support. Any suitable solidsupport known in the art may be used, for example, arrays, beads,microparticles, microtitre dishes and gels may be used. In someembodiments the capture probes are synthesized on an array in a 5′ to 3′direction.

In some embodiments hybridization and extension of capture probes aredone while the capture probes are attached to a solid support. Followingextension of the capture probes nucleic acids that are not covalentlyattached to the solid support may be washed away. In some embodimentsthe extended capture probes are released from the solid support prior toamplification. In another embodiment amplification takes place while theextended capture probes are attached to the solid support. The extendedcapture probes may be released from the solid support by, for example,using a reversible linker or an enzymatic release, such as anendonuclease or by a change in conditions that results in disruption ofan interaction between the capture probe and the solid support, forexample, when capture probes are associated with the solid supportthrough base pairing between a tag in the capture probe and a tag probeon the solid support, disruption of the base pairing interactionreleases the capture probes from the solid support. Enzymatic methodsinclude, for example, use of uracil DNA glycosylase (UDG) or (UNG). UNGcatalyzes the hydrolysis of DNA that contains deoxyuridine at the sitethe uridine is incorporated. Incorporation of one or more uridines inthe capture probe followed by treatment with UNG will result in releaseof the capture probe from the solid support. A thermolabile UNG may alsobe used

In some embodiments a collection of target sequences is analyzed. Aplurality of capture probes is designed for a plurality of targetsequences. In some embodiments target sequences contain or are predictedto contain a polymorphism, for example, a SNP. The polymorphism may be,for example, near a gene that is a candidate marker for a phenotype,useful for diagnosis or a disorder or for carrier screening or thepolymorphism may define a haplotype block (see, Daly et al. Nat. Genet.29:229-32 (2001), and Rioux et al. Nat. Genet. 29:223-8 (2001) and U.S.patent application Ser. No. 10/213,272, now abandoned, each of which isincorporated herein by reference in its entirety). A collection ofcapture probes may be designed so that capture probes hybridize near apolymorphism, for example, within 1, 5, 10, or 100 to 5, 10, 100, 1000,10,000 or 100,000 bases from the polymorphism. The capture probeshybridize to one strand of the target sequence and can be extendedthrough the polymorphic site or region so that the extension productcomprises a copy of the polymorphic region.

Many amplification methods are most efficient at amplification ofsmaller fragments. For example, PCR most efficiently amplifies fragmentsthat are smaller than 2 kb (see, Saiki et al. 1988). In one embodimentcapture probes and fragmentation conditions are selected for efficientamplification of a selected collection of target sequences. The size ofthe amplified fragments is dependent on where the target-specific regionof the capture probe hybridizes to the target sequence and the 5′ end ofthe fragment strand that the capture probe is hybridized to. In someembodiments of the present methods capture probes and fragmentationmethods are designed so that the target sequence of interest can beamplified as a fragment that is, for example, less than 20,000, 2,000,800, 500, 400, 200 or 100 base pairs long. The capture probe can bedesigned so that the 3′ end of the target-specific region hybridizes tothe base that is just 3′ of a position to be interrogated in the targetsequence. For example, if the sequence to be interrogated is apolymorphism and the sequence is 5′-GCTXATCGG-3′, where X is thepolymorphic position, the target-specific region of the capture probemay have the sequence 5′---CCGAT-3′. When the sample is fragmented withsite specific restriction enzymes the length of the fragments will alsodepend on the position of the nearest recognition site for the enzyme orenzymes used for fragmentation. A collection of target sequences may beselected based on proximity to restriction sites. In some embodimentstarget sequences are selected for amplification and analysis based onthe presence of a sequence of interest, such as a SNP, and proximity toa cleavage site for a selected restriction enzyme. For example, SNPsthat are within 200, 500, 800, 1,000, 1,500, 2,000 or 20,000 base pairsof either a restriction site, such as, for example, an EcoRI site, aBglI site, an XbaI site or any other restriction enzyme site may beselected to be target sequences in a collection of target sequences. Inanother method a fragmentation method that randomly cleaves the sampleinto fragments that are 30, 100, 200, 500 or 1,000 to 100, 200, 500,1,000 or 2,500 base pairs on average may be used.

In another embodiment, illustrated in FIG. 7, the capture probes are insolution and hybridization and extension take place in solution. In thisembodiment the nucleic acid sample is fragmented and adaptor containingcommon sequences A2 and A3 is ligated to the fragments. In someembodiments one strand of the adaptor, the strand that is ligated to the3′ end of the fragment strands lacks common sequence A2 and is blockedfrom extension at the 3′ end. Ligation of the blocked adaptor strand tothe 3′ end of the fragment strands prevents the fragments from beingextended to incorporate A2 at both ends, thus preventing amplificationof the fragments by primer A2 in the subsequent PCR amplification step.Capture probes with locus specific regions and common sequence A1 aremixed with the adaptor-ligated fragments under conditions that allowhybridization of the capture probes to the adaptor ligated fragments.The capture probes are extended in the presence of polymerase and dNTPs.In some embodiments the extended capture probes are positively selectedto generate a sample that is enriched for extended capture probes. Inanother embodiment extended capture probes are enriched by depletingnon-extended products.

In another embodiment the capture probes comprise a first commonsequence, a tag sequence, a target sequences and a recognition sequencefor a Type IIs restriction enzyme (see, FIGS. 8A and 8B, SEQ ID NOS:4-12). The Type IIs recognition site is inserted within thetarget-specific region so that there is target-specific sequence oneither side of the Type IIs recognition sequence and the tag sequence is3′ of the common sequence. In many embodiments there will be one or moremismatches between the probe and the target at the site of the Type IIssite. In some embodiments the Type IIs site is positioned so that whenthe fragment is digested the enzyme cuts between the polymorphicposition and the base just 5′ of the polymorphic position. The nucleicacid sample is fragmented and ligated to adaptors comprising a secondcommon sequence. The capture probes and adaptor-ligated fragments aremixed under conditions that allow hybridization and the capture probesare extended. The extended capture probes are then made double strandedusing a primer that is complementary to the adaptor. The double strandedextended capture probes are amplified using primers to the commonsequence in the capture probe and the common sequence in the adaptor.

To detect the allele or alleles present the amplified fragments aredigested with a Type IIs restriction endonuclease and the fragments(FIG. 8B) are extended in the presence of labeled ddNTPs. The fragmentswill be extended by a single ddNTP which corresponds to the allelepresent at the polymorphic position. The extended fragments arehybridized to an array of tag probes and the labeled nucleotide ornucleotides present at each location are determined. In one embodimentthe ddNTPs are all labeled with the same label, for example, biotin andthe fragments are extended in four separate reactions, one for each ofthe four different ddNTPs. Each reaction is hybridized to a differentarray so four arrays are used. In another embodiment the ddNTPs arelabeled with differentially detectable labels. In one embodiment thereare four different labels and the extension reaction may be done in asingle reaction and the hybridization may be to a single array. Inanother embodiment there are two different labels and extension reactionmay be done in two reactions and the hybridization may be to twodifferent arrays.

In many embodiments of the present methods one or more enrichment stepmay be included to generate a sample that is enriched for extendedcapture probes prior to amplification with common sequence primers (see,FIGS. 9-11). In some embodiments it is desirable to separate extendedcapture probes from fragments from the starting nucleic acid sample,adaptor-ligated fragments, adaptor sequences or non-extended captureprobes, for example. In one embodiment (FIG. 9) the capture probes areextended in the presence of a labeled dNTP, for example dNTPs labeledwith biotin. The labeled nucleotides are incorporated into the extendedcapture probes and the labeled extended capture probes are thenseparated from non-extended material by affinity chromatography. Whenthe label is biotin the labeled extended capture probes can be isolatedbased on the affinity of biotin for avidin, streptavidin or a monoclonalanti-biotin antibody. In one embodiment the antibody may be coupled toprotein-A agarose, protein-A sepharose or any other suitable solidsupport known in the art. Those of skill in the art will appreciate thatbiotin is one label that may be used but any other suitable label or acombination of labels may also be used, such as fluorescein which may beincorporated in the extended capture probe and an anti-fluoresceinantibody may be used for affinity purification of extended captureprobes. Other labels such as, digoxigenin, Cyanine-3, Cyanine-5,Rhodamine, and Texas Red may also be used. Antibodies to these labelingcompounds may be used for affinity purification. Also, other haptensconjugated to dNTPs may be used, such as, for example, dinitrophenol(DNP).

In another embodiment (FIG. 10) capture probes that have been extendedthrough the adaptor sequence (A2) on the adaptor modified DNA are madedouble stranded by hybridizing and extending A2 primer. Only the fullyextended capture probes will have the A2 priming site so partiallyextended capture probes will remain single-stranded. The sample is thendigested with a nuclease that selectively digests single strandednucleic acid, such as E. Coli Exonuclease I. The sample is thenamplified with primers A1 and A2.

In another embodiment (FIG. 11) extension products may be enriched bycircularization followed by digestion with a nuclease such asExonuclease VII or Exonuclease III. The extended capture probes may becircularized, for example, by hybridizing the ends of the extendedcapture probe to an oligonucleotide splint so that the ends arejuxtaposed and ligating the ends together. The splint will hybridize tothe A1 and A2 sequences in the extended capture probe and bring the 5′end of the capture probe next to the 3′ end of the capture probe so thatthe ends may be ligated by a ligase, for example DNA Ligase or AmpligaseThermostable DNA. See, for example, U.S. Pat. No. 5,871,921 which isincorporated herein by reference. The circularized product will beresistant to nucleases that require either a free 5′ or 3′ end.

A variety of nucleases may be used in one or more of the embodiments.Nucleases that are commercially available and may be useful in thepresent methods include: Mung Bean Nuclease, E. Coli Exonuclease I,Exonuclease III, Exonuclease VII, T7 Exonuclease, BAL-31 Exonuclease,Lambda Exonuclease, RecJ_(f), and Exonuclease T. Different nucleaseshave specificities for different types of nucleic acids making themuseful for different applications. Exonuclease I catalyzes the removalof nucleotides from single-stranded DNA in the 3′ to 5′ direction.Exonuclease I degrades excess single-stranded primer oligonucleotidefrom a reaction mixture containing double-stranded extension products.Exonuclease III catalyzes the stepwise removal of mononucleotides from3′-hydroxyl termini of duplex DNA. A limited number of nucleotides areremoved during each binding event, resulting in coordinated progressivedeletions within the population of DNA molecules. The preferredsubstrates are blunt or recessed 3′-termini, although the enzyme alsoacts at nicks in duplex DNA to produce single-strand gaps. The enzyme isnot active on single-stranded DNA, and thus 3′-protruding termini areresistant to cleavage. The degree of resistance depends on the length ofthe extension, with extensions 4 bases or longer being essentiallyresistant to cleavage. This property can be exploited to produceunidirectional deletions from a linear molecule with one resistant(3′-overhang) and one susceptible (blunt or 5′-overhang) terminus.Exonuclease VII is a single-strand directed enzyme with 5′ to 3′- and 3′to 5′-exonuclease activities making it the only bi-directional E. coliexonuclease with single-strand specificity. The enzyme has no apparentrequirement for divalent cation, and is fully active in the presence ofEDTA. Initial reaction products are acid-insoluble oligonucleotideswhich are further hydrolyzed into acid-soluble form. The products oflimit digests are small oligomers (dimers to dodecamers). For additionalinformation about nucleases see catalogues from manufacturers such asNew England Biolabs, Beverly, Mass.

In some embodiments one of the primers added for PCR amplification ismodified so that it is resistant to nuclease digestion, for example, bythe inclusion of phosphorothioate. Prior to hybridization to an arrayone strand of the double stranded fragments may be digested by a 5′ to3′ exonuclease such as T7 Gene 6 Exonuclease.

In some embodiments the nucleic acid sample, which may be, for example,genomic DNA, is fragmented, using for example, a restriction enzyme,DNase I or a non-specific fragmentation method such as that disclosed inU.S. Pat. No. 6,495,320, which is incorporated herein by reference inits entirety. Adaptors containing at least one priming site are ligatedto the fragmented DNA. Locus-specific primers are synthesized whichcontain a different adaptor sequence at the 5′ end. The adaptor-ligatedgenomic DNA is hybridized to the locus-specific primers and the locusspecific primer is extended. This may be done for example, by theaddition of DNA polymerase and dNTPs. Extension products may beamplified with primers that are specific for the adaptor sequences. Thisallows amplification of a collection of many different sequences using alimited set of primers. For example, a single set of primers may be usedfor amplification. In another embodiment a second amplification step iscarried out using the same or different primers.

In some embodiments the amplified products are analyzed by hybridizationto an array of probes attached to a solid support. In some embodimentsan array of probes is specifically designed to interrogate a collectionof target sequences. The array of probes may interrogate, for example,from 1,000, 5,000, 10,000 or 100,000 to 2,000, 5,000, 10,000, 100,000,1,000,000 or 3,000,000 different target sequences. In one embodiment thetarget sequences contain SNPs and the array of probes is designed tointerrogate the allele or alleles present at one or more polymorphiclocation. The array may comprise a collection of probes that hybridizespecifically to one or more SNP containing sequences. The array maycomprise probes that correspond to different alleles of the SNP. Oneprobe or probe set may hybridize specifically to a first allele of aSNP, but not hybridize significantly to other alleles of the SNP and asecond probe set may be designed to hybridize to a second allele of aSNP but not hybridize significantly to other alleles. A hybridizationpattern from the array indicates which of the alleles are present in thesample. An array may contain probe sets to interrogate, for example,from 1,000, 5,000, 10,000 or 100,000 to 2,000, 5,000, 10,000, 100,000,1,000,000 or 3,000,000 different SNPs.

In another embodiment an array of probes that are complementary to tagsequences present in the capture probes is used to interrogate thetarget sequences. In some embodiments the amplified targets are analyzedon an array of tag sequences, for example, the Affymetrix GenFlex® array(Affymetrix, Inc., Santa Clara, Calif.). In this embodiment the captureprobes comprise a tag sequence that is unique for each species ofcapture probe. A detectable label that is indicative of the allelepresent at the polymorphic site of interest is associated with the tag.The labeled tags are hybridized to the one or more arrays and thehybridization pattern is analyzed to determine which alleles arepresent.

In another embodiment methods for generating a plurality of differentoligonucleotides are disclosed. Oligonucleotides are synthesized inparallel on a solid support. The oligonucleotides are then released fromthe solid support and used for further analysis. The released probes maybe used, for example, for multiplex PCR amplification of a collection oftarget sequences, for probes, for primers for reverse transcription oramplification or for any other use of oligonucleotides known in the art.In one embodiment the oligonucleotides on the solid support comprise acollection of capture probes.

In another embodiment kits that are useful for the present methods aredisclosed. In one embodiment a kit for amplifying a collection of targetsequences is disclosed. The kit may comprise one or more of thefollowing: a collection of capture probes as disclosed, one or moreadaptor, one or more generic primers for common sequences, one or morerestriction enzymes, buffer, one or more polymerase, a ligase, buffer,dNTPs, ddNTPs, and one or more nucleases. The restriction enzyme of thekit may be a type-IIs enzyme. The capture probes may be attached to asolid support.

The methods of the presently claimed invention can be used for a widevariety of applications. Any analysis of genomic DNA may be benefited bya reproducible method of complexity management. Furthermore, the methodsand enriched fragments of the presently claimed invention areparticularly well suited for study and characterization of extremelylarge regions of genomic DNA.

In a preferred embodiment, the methods of the presently claimedinvention are used for SNP discovery and to genotype individuals. Forexample, any of the procedures described above, alone or in combination,could be used to isolate the SNPs present in one or more specificregions of genomic DNA. Selection probes could be designed andmanufactured to be used in combination with the methods of the inventionto amplify only those fragments containing regions of interest, forexample a region known to contain a SNP. Arrays could be designed andmanufactured on a large scale basis to interrogate only those fragmentscontaining the regions of interest. Thereafter, a sample from one ormore individuals would be obtained and prepared using the sametechniques which were used to prepare the selection probes or to designthe array. Each sample can then be hybridized to an array and thehybridization pattern can be analyzed to determine the genotype of eachindividual or a population of individuals. Methods of use forpolymorphisms and SNP discovery can be found in, for example, in U.S.Pat. No. 6,361,947 and co-pending U.S. application Ser. No. 08/813,159which are herein incorporated by reference in their entirety for allpurposes).

EXAMPLES Example 1

Msp Digestions of HTR2A PCR Product. HTR2A PCR products were taken from16 individuals, digested with the restriction enzyme Msp and the sampleswere run on a gel. The genotypes of the 16 individuals were thefollowing: 1 individual with a TT genotype, 7 individuals with a CTgenotype, and 8 individuals with a CC genotype. Individuals with a CTgenotype produce two distinct bands on the gel in which the second bandis aligned equally with one band produced for individuals with CCgenotype. The first band of the CT genotype is higher on the gel than CCgenotype band. Individuals with TT genotype have one distinct band thatis higher on the gel than the individuals with CC genotype. Theexperiment showed that Msp digestion worked on individuals with a CTgenotype.

Example 2

Exonuclease Sensitivity of Primers. Allele-specific forward primers wererun with different polymerases and phosphorothioate linkages. Theprimers were incubated with no enzyme, VENT polymerase, DEEP VENTpolymerase, and DEEP VENT (exo-) polymerase with 0, 1 and 3phosphorothioate linkages (all polymerases from New England Biolabs).All products were incubated at 72° C. for 45 minutes. The products wererun on a 8M urea 15% acrylamide gel and stained with SYBR Green dye(Molecular Probes). No distinct bands were observed in the lanes wherethe primers had 0 phosphorothioate linkages in the VENT polymerase orDEEP VENT polymerase lanes. Distinct bands were found in other lanes.The experiment showed that allele-specific forward primers are resistantto 3′ to 5′ exonuclease activity only if they contain at least onephosphorathioate linkage.

Example 3

Phosphorothioate Linkages increase specificity of PCR. Samples were runon a gel with individuals with CC genotype, TT genotype, and no DNA andthese samples varied with 0, 1, and 3 phosphorothioate linkages. Thegels were on Pfu Ultra at 55° C., 60° C., and 65° C. The results showedthat no bands in the no DNA lanes at all temperatures. Bands werevisible in lanes with 1 or 3 phosphorothioate linkages with respect toCC genotype and TT genotype at all temperatures. However, no band wasvisible in the TT genotype lane with 0 phosphorothioate linkages at alltemperatures. The results indicate phosphorothioate linkages help toincrease the specificity of standard solution phase PCR.

Example 4

Cleavage from Solid Support. Two long allele-specific oligonucleotidesnecessary for the bead based approach were treated and run on a gel. Theproducts were treated with no treatment, Streptavidin, UV light,Streptavidin+UV light, E. Coli Endo V, and Endo V+Streptavidin. Theproducts were run on a 6%, 8M urea acrylamide gel and stained with SYBRGreen. This gel uses SA gel shift after various types of treatments toshow the modular nature of the oligos. The results showed that two bandswere observed in the UV light+streptavidin and Endo V+Streptavidinlanes. The two bands indicated that photocleavage (UV light) andenzymatic cleavage (Endo V) worked in the presence of streptavidin onthose particular DNA strands. Multiple bands were observed in thestreptavidin lanes while single distinct bands were observed in allother lanes.

Example 5

Multiplexed Anchored Runoff Amplification. Genomic DNA was digested withMsel and ligated to an adaptor containing T7 promoter sequence as apriming site. The final concentration of the genomic DNA was 10 ng/μl in1× T4 DNA Ligase Buffer. To generate extended capture probes 2.5 μl ofadaptor ligated DNA, 2.5 μl 10× Taq Gold Buffer, 2 μl 25 mM MgCl₂, 2.5μl 10× dNTPs, 5 μl of a 500 nM mixture of 150 different capture probesin TE buffer corresponding to 150 different forward primers from theHuSNP assay, 0.25 μl Perfect Match Enhancer, 0.25 μl AMPLITAQ GOLDenzyme (Applied Biosystems, Foster City, Calif.) and 10 μl of water weremixed to give a final reaction volume of 25 μl. The reaction wasincubated at 95° C. for 6 min followed by 26 cycles of 95° C. for 30sec, 68° C. for 2.5 min (decreasing 0.5° C. on each subsequent cycle)and 72° C. for 1 min, then to 4° C.

The extended capture probes were made double stranded by the addition of0.25 μl of 1 μM T7 primer and incubation at 95° C. for 2 min, 55° C. for2 min, 72° C. for 6 min, then to 4° C. The reaction was passed over aG-25 Sephadex column and 5 μl of 10× Exonuclease I Buffer (NEB) and 2 μlof Exonuclease I (NEB) were added and the reaction was incubated at 37°C. for 60 min, 80° C. for 20 min, then to 4° C. The products werepurified over a Qiagen (Valencia, Calif.) mini-elute column and elutedwith 10 μl EB Buffer.

Generic PCR was done as follows: 65.5 μl water, 10 μl 10× Taq GoldBuffer, 8 μl 25 mM MgCl2, 10 μl 10× dNTPs, 1 μl 1 μM T3 primer, 1 μl 1μM T7 primer 3 μl DNA, 0.5 μl Perfect Match Enhancer and 1 μl AMPLITAQGOLD enzyme (Applied Biosystems, Foster City, Calif.) were mixed in a100 μl final reaction volume and incubated at 95° C. for 8 min, 40cycles of 95° C. for 30 sec, 55° C. for 1 min, and 72° C. for 1 min,then 72° C. for 6 min followed and finally to 4° C.

An aliquot of the reaction was analyzed on a 2% agarose gel. Theproducts were concentrated using Qiagen QIAquick columns and eluted with10 μl EB Buffer. The products were fragmented, labeled and hybridized toan array under standard conditions and hybridization patterns wereanalyzed.

Example 6

Multiplexed Anchored Runoff Amplification with Biotin Enrichment.Prepare adaptor ligated genomic DNA as above. To generate extendedcapture probes 2.5 μl of adaptor ligated DNA, 2.5 μl 10× Taq GoldBuffer, 2 μl 25 mM MgCl₂, 0.5 μl 50× acGT (6 mM dATP, 6 mM dCTP, 10 mMdGTP, 10 mM dTTP), 5 μl of a 500 nM mixture of 150 different captureprobes in TE buffer corresponding to 150 different forward primers fromthe HuSNP assay, 0.25 μl Perfect Match Enhancer, 0.25 μl AMPLITAQ GOLDenzyme (Applied Biosystems, Foster City, Calif.), 2 μl 1 mMBiotin-N6-dATP (Perkin Elmer, Boston, Mass.), 2 μl 1 mM Biotin-N4-dCTP(Perkin Elmer) and 8 μl of water were mixed to give a final reactionvolume of 25 μl. The reaction was incubated at 95° C. for 6 min followedby 26 cycles of 95° C. for 30 sec, 68° C. for 2.5 min (decreasing 0.5°C. on each subsequent cycle) and 72° C. for 1 min, then to 4° C. Passreaction over G-25 Sephadex column to remove unincorporatedbiotin-dNTPs.

Enrich for biotinylated extension products. Adjust the G-25 eluate to1×PCR buffer and 2 mM MgCl₂. Add 15 μl monoclonal anti-biotin agarose(Clone BN-34, Sigma). Incubate at room temperature for 30 min withgentle agitation. Spin down agarose resin for 3 min at 5,000 rpm.Aspirate away supernatant and wash agarose resin with 250 μl 1×PCRbuffer with 2 mM MgCl₂. Aliquot agarose resin into PCR tubes for genericPCR with T3 and T7 primers.

Generic PCR was done as follows: 65.5 μl water, 10 μl 10× Taq GoldBuffer, 8 μl 25 mM MgCl₂, 10 μl 10× dNTPs, 1 μl 1 μM T3 primer, 1 μl 1μM T7 primer, 3 μl DNA, 0.5 μl Perfect Match Enhancer and 1 μl AMPLITAQGOLD enzyme (Applied Biosystems, Foster City, Calif.) were mixed in a100 μl final reaction volume and incubated at 95° C. for 8 min, 40cycles of 95° C. for 30 sec, 55° C. for 1 min, and 72° C. for 1 min,then 72° C. for 6 min and finally to 4° C.

An aliquot of the reaction was analyzed on a 2% agarose gel. Theproducts were concentrated using Qiagen QIAQUICK columns and eluted with30 μl EB Buffer. The products were fragmented with DNase I, labeled withbiotin-ddATP using TdT, and hybridized to an array under standardconditions. Hybridization patterns were analyzed.

Example 7

Multiplexed Anchored Runoff Amplification with Exo III Enrichment.Prepare adaptor ligated genomic DNA as above. Kinase capture probes byincubating 12 μl of a 150-plex stock of either forward or reverse HUSNPprimers (Affymetrix) with 12.7 μl H₂O, 3 μl 10× T4 polynucleotide kinasebuffer, 0.3 μl 100 mM ATP, and 2 μl T4 Polynucleotide Kinase. Incubatethe reaction at 37° C. for 30 min. Adjust reaction volume to 50 μl andpass reaction over G-25 column to exchange buffer.

To generate extended capture probes 5 μl of adaptor ligated DNA, 5 μl10× Taq Gold Buffer, 4 μl 25 mM MgCl₂, 5 μl 10× dNTPs, 20 μl of thekinased mixture of 150 different capture probes, 1 μl Perfect MatchEnhancer, 0.5 μl AMPLITAQ GOLD enzyme (Applied Biosystems, Foster City,Calif.) and 9.5 μl of water were mixed to give a final reaction volumeof 50 μl. The reaction was incubated at 95° C. for 6 min followed by 26cycles of 95° C. for 30 sec, 68° C. for 2.5 min (decreasing 0.5° C. oneach subsequent cycle) and 72° C. for 1 min, then finally to 4° C. Passthe reaction over a G-25 column to exchange buffer.

Convert the single strand extension products to single strand circlesusing splint oligonucleotides and AMPLIGASE Thermostable DNA Ligase(Epicenter, Madison, Wis.). The sequence of the T3-T7 splint oligo is(SEQ ID NO: 3)

5′TCTCCCTTTAGTGAGGGTTAATTTGTAATACGACTCACTATA GGGCA-3′.Mix 39.75 μl water, 7.5 μl 10× AMPLIGASE Buffer, 1.25 μl 70 μM splintoligo, 25 μl 5′ phosphorylated single strand extension products and 1.5μl AMPLIGASE Thermostable DNA Ligase 5 U/μl. Incubate the mixture at 95°C. for 3 min, then 10 cycles of 95° C. for 30 sec and 72° C. for 3 min,then 10 cycles of 95° C. for 30 sec and 70° C. for 3 min, then 10 cyclesof 95° C. for 30 sec and 68° C. for 3 min, then 10 cycles of 95° C. for30 sec and 66° C. for 3 min, then 10 cycles of 95° C. for 30 sec and 64°C. for 3 min, then 10 cycles of 95° C. for 30 sec and 62° C. for 3 min.Hold at 4° C. Pass reaction over G-25 column to exchange buffer. Digestuncircularized nucleic acids. Mix 13 μl water, 10 μl 10× Exo III Buffer,75 μl AMPLIGASE splint reaction and 2 μl Exonuclease III 100 U/μl (NEB,Beverly, Mass.). Incubate at 37° C. for 1 hour. Heat inactivate at 70°C. for 20 min. Fragment, label and hybridize as above.

From the foregoing it can be seen that the present invention provides aflexible and scalable method for analyzing complex samples of DNA, suchas genomic DNA. These methods are not limited to any particular type ofnucleic acid sample: plant, bacterial, animal (including human) totalgenome DNA, RNA, cDNA and the like may be analyzed using some or all ofthe methods disclosed in this invention. This invention provides apowerful tool for analysis of complex nucleic acid samples. Fromexperiment design to isolation of desired fragments and hybridization toan appropriate array, the above invention provides for fast, efficientand inexpensive methods of complex nucleic acid analysis.

All patents, publications and patent applications cited above areincorporated by reference in their entirety for all purposes to the sameextent as if each individual patent, publication or patent applicationwere specifically and individually indicated to be so incorporated byreference. Although the present invention has been described in somedetail by way of illustration and example for purposes of clarity andunderstanding, it will be apparent that certain changes andmodifications may be practiced within the scope of the appended claims.

1. A method of amplifying a collection of target sequences from anucleic acid sample the method comprising: generating a collection ofcapture probes comprising a plurality of different species of primerswherein each species comprises a first common sequence that is common toeach capture probe in the collection of capture probes and a 3′ variableregion that is specific for a target sequence in the collection oftarget sequences; fragmenting the nucleic acid sample; ligating anadaptor comprising a second common sequence to the fragments, whereinthe adaptor is ligated to the fragments so that the strand that isligated to the 5′ end of the fragment strands comprises the secondcommon sequence and the strand that is ligated to the 3′ end of thefragments lacks the complement of the second common sequence and isblocked from extension at the 3′ end; hybridizing the adaptor-ligatedfragments to the collection of capture probes; extending the captureprobes to obtain extension products; and amplifying the extended captureprobes with first and second common sequence primers to obtain anamplified collection of target sequences.
 2. The method of claim 1wherein an amino group is used to block extension at the 3′ end of theadaptor strand that is ligated to the 3′ end of the fragments.
 3. Amethod of analyzing a nucleic acid sample comprising: amplifying acollection of target sequences from the nucleic acid sample according tothe method of claim 1; hybridizing the amplified collection of targetsequences to an array; and analyzing the hybridization pattern to detectthe presence or absence of target sequences from the collection oftarget sequences.
 4. A method of genotyping one or more polymorphiclocations in a sample comprising: preparing an amplified collection oftarget sequences from the sample according to the method of claim 1;hybridizing the amplified collection of target sequences to an arraydesigned to interrogate at least one polymorphic location in thecollection of target sequences; and analyzing the hybridization patternto determine the identity of the allele or alleles present at one ormore polymorphic location in the collection of target sequences.
 5. Amethod for analyzing sequence variations in a population of individualscomprising; obtaining a nucleic acid sample from each individual;amplifying a collection of target sequences from each nucleic acidsample according to the method of claim 1; hybridizing each amplifiedcollection of target sequences to an array designed to interrogatesequence variation in the collection of target sequences to generate ahybridization pattern for each sample; and analyzing the hybridizationpatterns to determine the presence or absence of sequence variation inthe population of individuals.
 6. The method of claim 1 wherein thenucleic acid sample is fragmented by digestion with one or morerestriction enzymes.
 7. The method of claim 1 wherein prior toamplification the extension products are enriched in the sample to beamplified.
 8. The method of claim 1 wherein labeled nucleotides areincorporated into the extension products and the extension products areenriched by affinity chromatography.
 9. The method of claim 8 whereinthe labeled nucleotides are labeled with biotin and avidin, streptavidinor an anti-biotin antibody is used to isolate extension products. 10.The method of claim 1 wherein prior to amplification the extendedcapture probes are made double stranded and single stranded nucleic acidin the sample is digested.
 11. The method of claim 10 wherein the singlestranded nucleic acid in the sample is digested with a nuclease.
 12. Themethod of claim 11 wherein the nuclease is Exonuclease I.
 13. The methodof claim 1 wherein prior to amplification the extended capture probesare circularized and uncircularized nucleic acid in the sample isdigested.
 14. The method of claim 13 wherein extended capture probes arecircularized by a method comprising: hybridizing an oligonucleotidesplint to the extended capture probes, wherein the splint iscomplementary to the first and second common sequences, therebyjuxtaposing the 5′ and 3′ ends of extended capture probes; and ligatingthe ends of the extended capture probes to form circular extendedcapture probes.
 15. The method of claim 13 wherein the uncircularizednucleic acid remaining in the sample is digested with a nuclease. 16.The method of claim 15 wherein the nuclease is Exonuclease III.
 17. Themethod of claim 1 wherein there are 100 to 1,500 different targetsequences in the collection of target sequences.
 18. The method of claim1 wherein there are 1,000 to 10,000 different target sequences in thecollection of target sequences.
 19. The method of claim 1 wherein thereare 10,000 to 1,000,000 different target sequences in the collection oftarget sequences.
 20. A method of reducing the genetic complexity of apopulation of nucleic acid molecules and analyzing the sequence ofnucleic acids in the reduced complexity sample, the method comprisingthe steps of: (a) providing on a solid support target fragments of thepopulation captured by hybridization of a fragmented nucleic acid sampleto a plurality of capture probes that are covalently attached to a solidsupport in known locations; (b) separating unbound and non-specificallyhybridized nucleic acids from the captured target fragments; (c)amplifying the captured target fragments to obtain a reduced complexitysample; and (d) analyzing a plurality of the amplified target fragmentsin the reduced complexity sample to determine at least some sequence ofthe nucleic acids.
 22. The method according to claim 20 furthercomprising the step of ligating an adaptor molecule to at least one endof the target fragments prior to step (a).
 23. The method according toclaim 22 further comprising the step of amplifying the nucleic acidmolecules with at least one primer that comprises a sequence thatspecifically hybridizes to the sequence of the adaptor molecule.
 24. Themethod according to claim 23 wherein the probes are selected from thegroup consisting of a plurality of probes that defines a plurality ofexons, introns or regulatory sequences from a plurality of genetic loci,a plurality of probes that defines a complete sequence of at least onesingle genetic locus, a plurality of probes that defines sites known tocontain single nucleotide polymorphisms (SNPs), and a plurality ofprobes that defines an array designed to capture the complete sequenceof at least one complete chromosome.
 25. The method according to claim24 wherein the at least one single genetic locus has a size selectedfrom the group consisting of between 100 and 10,000 base pairs orbetween 4,000 and 500,000 base pairs.
 26. A method for determiningnucleic acid sequence information about at least one region of nucleicacid, the method comprising the steps of (i) reducing the geneticcomplexity of a population of nucleic acid molecules according to amethod comprising the steps of: (a) providing on a solid support targetfragments of the population captured by specific hybridization to aplurality of capture probes wherein the fragmented target molecules arebetween 100 and 10,000 base pairs in size, by hybridizing fragmentedtarget molecules to a solid support having a plurality of capture probescovalently attached thereto, (b) separating unbound and non-specificallyhybridized nucleic acids from the captured target molecules; (c)amplifying the captured target molecules and eluting the captured targetmolecules from the solid support; and (ii) determining the nucleic acidsequence of the captured target molecules by sequencing at least aportion of the amplified target molecules.
 27. The method according toclaim 26 wherein the fragmented target molecules are ligated to anadaptor molecule prior to hybridization to the capture probes andwherein the amplification is performed with at least one primer thatcomprises a sequence that specifically hybridizes to the sequence of theadaptor molecule.
 28. A kit comprising adaptor molecules, and aplurality of capture probes on a solid support, wherein the targets areselected from the group consisting of a plurality of coding regions orregulatory sequences from a plurality of genetic loci and a plurality ofsites known to contain SNPs and further comprising at least oneadditional component selected from the group consisting of DNApolymerase, T4 polynucleotide kinase, T4 DNA ligase, a hybridizationbuffer, a wash buffer, and an elution buffer.
 29. A method for reducingthe complexity of a genomic sample to obtain a sample that is enrichedfor a collection of target sequences comprising: (i) hybridizing acollection of target-specific primers to a genomic DNA sample comprisingadaptor modified fragments, wherein each target-specific primercomprises a 5′ first common sequence that is not complementary to thetarget sequence and a 3′ region that is complementary to the targetsequence; (ii) extending the target-specific primers using the adaptormodified fragments as template to obtain primer extension products thathave a second common sequence at the 3′ end of at least some of theextension products; (iii) annealing a splint oligonucleotide thatcomprises the complement of the first common sequence and the complementof the second common sequence to the extension products, wherein thesplint oligonucleotide brings the 5′ and 3′ ends of the extensionproduct together; (iv) ligating the 5′ and 3′ ends of the extensionproducts to form a circular extension product; (v) treat withexonuclease VII to remove uncircularized nucleic acid; and (vi) amplifywith PCR.
 30. A method of genotyping a plurality of polymorphisms in anucleic acid sample comprising: (a) mixing the nucleic acid sample witha plurality of pairs of capture probes wherein each pair of captureprobes consists of a first and a second allele-specific primer, whereinthe first allele-specific primer is complementary to a first allele of aselected polymorphism and includes the polymorphic position and thesecond allele-specific primer is complementary to a second allele of theselected polymorphism and includes the polymorphic position, whereinsaid first and second allele-specific primers are resistant to 3′exonuclease activity; (b) subjecting the mixture from a) to a primerextension reaction, wherein said first and second allele-specificprimers are extended with a DNA polymerase comprising a proofreadingactivity to generate first and second allele-specific extension productsin the presence of the first and second alleles, respectively; and (c)detecting the presence of said first and second allele-specificextension products, wherein the presence of a first allele-specificextension product is indicative of the presence of the first allele of aselected polymorphism and presence of a second allele-specific extensionproduct is indicative of the presence of the second allele.
 31. Themethod of claim 30 wherein said first and second allele-specific primerscomprise 1, 2 or 3 phosphorothioate linkages at the 3′ end.
 32. Themethod of claim 30 wherein said first and second allele-specific primerscomprise a locked nucleic acid linkage at the 3′ end, wherein saidlinkage is between the terminal 3′ base and the penultimate base orbetween the penultimate base and the base immediately 5′ of thepenultimate base.
 33. The method of claim 30 wherein the nucleic acidsample is an amplified genomic DNA sample obtained by amplifying thenucleic acid sample using random primers and DNA polymerase in anisothermal amplification reaction to obtain an amplified nucleic acidsample and further comprising fragmenting the amplified nucleic acidsample prior to step (a).
 34. A method for genotyping a plurality ofpolymorphisms in a nucleic acid sample, where each polymorphism has afirst and a second allele, comprising: incubating the nucleic acidsample with a plurality of beads attached to allele-specific captureprobes to allow formation of complexes between target fragments andallele-specific capture probes, wherein the allele-specific captureprobes comprise: (i) a linker, (ii) a cleavage region, (iii) a tagregion, wherein each different allele-specific probe has a differentsequence tag region and wherein the tag region is at least 15 bases;and, (iv) a target-specific region that terminates at its 3′ end with abase that is complementary to a polymorphic base in the target;extending the allele-specific capture probes in the presence of labelednucleotides using the target fragment as template to obtain labeledallele-specific capture probes, wherein extension of the allele-specificcapture probes is blocked if there is a mismatch between the polymorphicposition and the 3′ end of the allele-specific capture probe; separatingthe target fragment from the labeled allele-specific capture probes;cleaving at least a portion of the labeled allele-specific captureprobes from the beads; detecting the released labeled allele-specificcapture probes by hybridization to an array of tag probes, wherein saidtag probes of known sequence are present at known or determinablelocations on said array and each tag probe is complementary to adifferent tag region present in the allele-specific capture probes; anddetermining the genotype of said plurality of polymorphisms bydetermining which alleles are present, wherein the presence ofhybridized labeled allele-specific capture probes is indicative of thepresence of a particular allele in the nucleic acid sample.
 35. Themethod of claim 34 wherein said step of cleaving comprises an enzymaticstep.
 36. The method of claim 35 wherein the cleavage region comprisesone or more uracil bases and cleavage comprises incubation with a uracilDNA glycosylase to generate abasic sites and the abasic sites arecleaved with an AP endonuclease or with acid or heat treatment.
 37. Themethod of claim 35 wherein the cleavage region comprises one or moreinosines and cleavage comprises incubation with an Endonuclease V. 38.The method of claim 34 wherein said step of cleaving comprises photocleavage of a light sensitive linkage in the capture probe.
 39. A methodfor genotyping a plurality of polymorphisms in a nucleic acid samplecomprising: fragmenting the nucleic acid sample to generate fragments;hybridizing a collection of capture probes to target fragments, whereinsaid capture probes are attached to a solid support at a 5′ end and eachcomprises: (i) a spacer sequence near said 5′ end, (ii) a dU regioncomprising a plurality of uracil residues, (iii) a tag sequence of atleast 15 bases that is unique for each species of capture probe, (iv) atarget-specific sequence, and (v) an allele-specific nucleotidecorresponding to one allele of a polymorphism in said plurality ofpolymorphisms, wherein the capture probes terminates at its 3′ end withsaid allele-specific nucleotide; extending said capture probes with aDNA polymerase to generate extended capture probes in an allele-specificextension reaction; washing the solid support to remove the targetfragments; cleaving the extended capture probes from the solid supportby a method comprising photo cleavage or enzymatic cleavage; hybridizingthe extended capture probes to an array comprising a plurality of tagprobe features, wherein each tag probe feature comprises a different tagprobe and wherein said tag probes are complementary to the tag sequencesof the capture probes; and detecting the presence of capture probeswherein the presence of an extended capture probe complementary to aselected allele is indicative of the presence of that allele in thenucleic acid sample.
 40. The method of claim 39 wherein the solidsupport comprises a plurality of beads.
 41. The method of claim 40wherein the beads are coated with anti-digoxigenin, and the captureprobes comprise a digoxigenin label.
 42. The method of claim 39 whereinthe capture probes are exonuclease resistant at the 3′ end and the DNApolymerase has 3′ to 5′ exonuclease proof-reading activity.
 43. Themethod of claim 42 wherein the DNA polymerase is selected from the groupconsisting of T4 DNA polymerase, E coli Klenow fragment, and T7 DNApolymerase.
 44. The method of claim 39 wherein the cleavage from thesolid support is enzymatic and comprises cleavage with an endonuclease.45. The method of claim 39 wherein the cleavage from the solid supportis enzymatic and comprises treatment with uracil DNA glycosylase andheat or acid treatment.
 46. The method of claim 39, wherein the cleavagefrom the solid support is photocleavage and comprises exposure to UVlight with a wavelength between 200 and 400 nanometers.
 47. The methodof claim 39 wherein the dU region comprises a plurality of inosineresidues and the cleavage step comprises cleavage with Endonuclease V.48. The method of claim 47 wherein the dU region comprises UIUI.
 49. Themethod of claim 39 wherein the 3′ end of the capture probes comprises 0,1, or 3 phosphorothioate linkages.